Вы находитесь на странице: 1из 110

Publisher:IIRAJ

2017, IIRAJInternational Conference, GIFT, Bhubaneswar, Khurda, India

No part of this book can be reproduced in any form or by any means without prior written
permission of the publisher.

ISBN- 978-93-86352-38-5

Type set & printed by:

IIRAJ Publication House


Bhubaneswar, India

INTERNATIONAL CONFERENCE
On

Contemporary issues in Science, Engineering and Management

(ICCI-SEM-2017)

CHIEF PATRONS
Dr. SatyaPrakash Panda
Chairman,
Gandhi Group of Institutions, Bhubaneswar

Mr. Biranchi Narayan Panda


Secretary,
Gandhi Institute For Technology, Bhubaneswar

Er. Patitapaban Panda


Vice Chairman and Director,
Gandhi Institute For Technology, Bhubaneswar

PATRON

Prof. (Dr.) S Krishna Mohan Rao


Principal, GIFT, Bhubaneswar

Prof. (Dr.) P.K Subudhi,


Dean (Academics), GIFT, Bhubaneswar

Lt. Col. M.K. Raut


Dean (Admin), GIFT, Bhubaneswar

CONVENOR
Prof. (Dr.) NabnitPanigrahi
Mechanical Engineering Dept.

CO-CONVENOR
Prof. Anil Kumar Nayak
Prof. Rajesh Kumar Ojha
Prof. BikashRanjanMoharana

INTERNATIONAL ADVISORY BOARD NATIONAL ADVISORY BOARD


Prof. (Dr.) P.N. Rao, Lowa, USA Shri H S VENKATESH, Scientist-G, ISRO
Prof. (Dr.) Christian, Berlin, Germany Prof. (Dr.) AjoyChakraborty, IIT Kharagpur
Prof. (Dr.) Aibing Yu, Australia Prof. (Dr.) AkhileshMohanm, IIT Kharagpur
Prof. (Dr.) AjitAbhram, Washington, USA Prof. (Dr.) Sujit Kumar Dash, IIT Kharagpur
Prof. (Dr.) Jonathan GohHui, Johor, Malaysia Prof. (Dr.) Ashok Kumar Gupta, IIT Kharagpur
Prof. (Dr.) ChinmayaKar, Al Jubail, Saudi Arabia Prof. (Dr.) Ramesh Garg, Dean Faculty, IIT Ropar
Prof. (Dr.) SrinivasPalanki, Beaumont, USA Prof. (Dr.) L.M. Pattnaik, IISC, Bengaluru
Prof. (Dr.) KrishaVedula, Lowell, USA Prof. (Dr.) B. N. Padhi, IIIT Bhubaneswar
Prof. (Dr.) Md. Atiqur Azad, Dhaka, Bangladesh Prof. (Dr.) B. N. Pattanaik, IIIT Bhubaneswar
Prof. (Dr.) JayanarayanSahu, Brunei, Malaysia Prof. (Dr.) R. P. Panda, VSSUT-Burla
Prof. (Dr.) SivakumarManickam, Selangor, Malaysia Prof. (Dr.) V. N. Mani, SC-E, DRDO, Hyderabad
Prof. (Dr.) VishKaliamani, Malaysia Er. Debadutta Mishra, SC-E, ISRO, Bangalore
Prof. (Dr.) Devendra Das, University of Alaska Prof. (Dr.) Mrutyunjay Panda, Utkal University
Prof. Ashok Kumar Senapati, Johannesburg, SA Prof. (Dr.) L. N. Panda, CET (BPUT), Bhubaneswar
Prof. (Dr.) AtalChaudhuri, Kolkata
Prof. (Dr.) Susanta Kumar Tripathy, NIT Silchar
Prof. A. Goswami, Dean CEP, IIT Kharagpur
Prof. (Dr.) B. K. Mishra, Director, IMMT, Bhubaneswar
Prof. (Dr.) A. K. Rath, VSSUT Burla
Prof. (Dr.) R.I. Ganguly, Ex-Professor, NIT Rourkela
Prof. (Dr.) R.R.Dash, Dean Former Scientist, NML
Prof. (Dr.) G.K Roy, Former Director NIT Rourkela
Prof. Dr. TeenaBagga, Amity University, India

ORGANIZING COMMITTEE MEMBERS


Prof. (Dr.) A. K. Swain
Prof. (Dr.) J Pattnaik
Prof. (Dr.) P. K. Dash
Prof. (Dr.) Sandhya Mishra
Prof. T K Panda
Prof. Sibananda Mishra
Prof. P.L. Mohanty
Prof. Kailash Chandra Rout
Prof. NiharRanjan Swain
Prof. SurajitPattnaik
Prof. AravindTripathy
Prof. SaumendraBehera
Prof. Ganesh P Khuntia
Prof. Srikanta Kr. Dash
Mr. BhabagrahiMohapatra A.O.(G)
Mr. DebarajMohanty
Mr. IquebalAhemad

Message from Director OF Isro

I am very pleased to learn that Gandhi Institute for Technology (GIFT) is


organizing International Conference on Contemporary issues in Science, Engineering
and Management (ICCI-SEM-2017) during 18th and 19th February, 2017 at
Bhubaneswar, Odisha.

This conference is aimed at bringing new techniques and horizons that will
contribute to innovative ideas and create an exciting environment with Academia,
Researchers and Industries in order to update the knowledge and pave the way for
future developments of contemporary issues in Science, Engineering and Management.

I am particularly happy to note the growth story of GIFT which is committed


to impart quality engineering education in a highly disciplined environment. Not only
will we be looking at the past success stories on this occasion but also at using this
opportunity to chart the future road ahead.

I wish the Conference a great success.

H.S.VENKATESH

Cha
airma
ans Mess
sage

I am gladd to note thhat Gandhi Institute For


F Technoology, GIFTT, which iss a
leadingg Engineeriing and Management
M t educationn center iss going to organize an
a
Internaational Connference on Contempporary Issuues of Sciience, Enggineering annd
Managgement (ICC CI-SEM-2K K17) on 18tth and 19thh February 2017

ICCI-SSEM-2K17 provides an a ideal acaademic platf tform for reesearchers tto present the
t
latest research
r finndings and describe
d em
merging techhnologies, annd directionns in Sciencce,
Engineeering and Managemen
M nt issues. The
Th conferennce seeks to contribute to presentiing
novel reesearch resuults in all asspects of Enngineering, Sciences annd Management.

The conferrence aims to t bring toggether leadiding academ mic scientistts, researcheers
and ressearch schoolars to excchange andd share theiir experiencces and ressearch results
about all
a aspects of Engineeering, Scieence and ManagemenM nt. It also provides the t
premierr interdiscipplinary foruum for scienntists, enginneers, managgers and prractitioners to
presentt their latest research results,
r ideaas, developmments, and application
a ns in all areaas.
The connference wiill bring toggether leadding academ mic scientistts, managerrs, researcheers
and schholars in thee domain off interest frrom aroundd the world.

I extend mym sincere greetings to the org


rganizers, delegates
d aand wish the
t
confereence a grandd Success.

DrSatyaP
Prakash Panda
Chairma
an
GIFT, Bhubanneswar

Message from Secretary

The dynamics of technologies in science and engineering as well as


Management present issues that affect all humanity. These dynamics include
competing beliefs and goals, methods of engagement, and conflict and cooperation.
Contemporary issues have political, economic, social, historic and geographic
components. Approaches to addressing global and regional issues reflect historical
influences and multiple perspectives.

It is a great pleasure and an honor to extend to you a warm invitation to


attend the International Conference on Contemporary Issues of Science, Engineering
and Management (ICCI-SEM-2K17) on 18th and 19th February,2017.

The 21st century is characterized by changing circumstances as new economies


emerge and new technologies change the way people interact. Issues related sciences,
engineering and management are universal.
I firmly believe that the International Conference on Contemporary Issues of
Science, Engineering and Management will focus on the current and emerging
research areas and will stimulate the thinking of all intellectual energies.

With best wishes

Secretary
GIFT, Bhubaneswar

Message from Principal

Issues related to science, engineering and management are inherently


complicated and addressing them requires individuals and groups to work through
decision-making processes prior to taking action.

Most of the issues are complex and have multiple feasible solutions. What is
considered a feasible solution to one group may not be considered feasible to others.
Additionally, all solutions have both positive and negative consequences.

International Conference on Contemporary Issues of Science, Engineering and


Management (ICCI-SEM-2K17) will identify issues, and research, debate and
propose appropriate solutions. As part of this conference we shall formulate action
plans, and predict and assess the possible consequences of each proposed solution,
weighing the costs and benefits of each approach.

I wish one and all engaged with the conference all the success in their
endeavor.

Principal
GIFT, Bhubaneswar

Vice Chairmans Message

All human activities have intended and unintended consequences for


ecological, social and economic systems. Individuals and societies make decisions every
day that result in consequences that may impact physical and human environments
today and in the future. Intended consequences are those that are expected or
anticipated. Decisions about human activities are often made by comparing the costs
and benefits of the anticipated consequences. Unintended consequences are those that
are not expected or anticipated. The difficulty of predicting how technological,
managerial and scientific systems will react to human activities often results in
unintended consequences. The consequences can be interpreted as positive or negative
based on differing perspectives and values.

International Conference on Contemporary Issues of Science, Engineering and


Management (ICCI-SEM-2K17) will identify consequences and propose appropriate
solutions.
I wish one and all engaged with the conference all the success in their
endeavor.

Vice Chairman
GIFT, Bhubaneswar

Message from Convener & Co-convener

Gandhi Institute For Technology,(GIFT) is organizing a two day


International conference on Contemporary Issues In Science, Engineering &
Management during 18the & 19th February 2017. This conference is aimed to focus
the modern trends in multidisciplinary trade and provides a platform to discuss
innovative ideas, technologies and managerial skill which will help the realization of
integrated and advanced technologies in the field of science, engineering and
management.
There has been over whelming response to this conference. We received a very
good number of papers from academicians, research scholars and students from various
institutes, out of which 150 papers have been accepted for presentation and
publication in proceedings and reputed journals.
We express our sincere thanks to the Keynote speakers for accepting our
invitation to deliver the keynote address during the conference. We are fortunate to
receive the consent from globally renowned educationists and scientists for their
participation in the forthcoming International events. A good number of experts from
reputed institutes like, IIT, NIT, and R& D organization have agreed to be an active
member in advisory board.
We express our heartfelt thanks to our Chairman Dr. SatyaPrakash Panda
and Vice Chairman Er. Patitapaban Panda for giving the opportunity to conduct
such a conference in our institute. We are also thankful to our Principal Dr. S.
Krishna Mohan Rao, Dean (Academics) Dr. P. K. Subudhi, Dean (Admin.) Lt.Col.
M.K.Raut for their whole hearted support to organize this conference on a high scale.
A conference of this magnitude is not possible without the active participation
of a large number of dedicated members. Special thanks to all department HODs, all
the staff members, students volunteers for providing their support to make the
conference a grand success.
Last but not the least, our sincere thanks to all the participants, delegates and
all those who directly or indirectly helped us to make the conference a fruitful one.

Convener & Co-convener


GIFT, Bhubaneswar

Preface

The International Conference on Contemporary issues in Science, Engineering and


Management(ICCI-SEM-2017) was held in the campus of Gandhi Institute ForTechnology,
Bhubaneswar, Khurda, India on 18th- 19th February 2017. Gandhi Institute ForTechnology,
Bhubaneswar, the home of major educational organizations amid pleasant surroundings, was a
delightful place for the conference.

The conference, which was organized by GIFT, Bhubaneswar in association with International
Institute of Research and Journals (IIRAJ) one of the fastest growing prestigious networks and an
independent non-profit professional association registered under Worldwide People Empowerment
Trust under section-25, Companies Act; 1956 meant for and aiming to promote the development of
scientific and research activities in science, engineering and technology in India and abroad. IIRAJ
Researchers forum constitutes of professional experts and overseas technical leaders who have left no
stones unturned to reinforce the field of science, engineering and technology.

The topics that are covered in the conference include computer and information systems, electronics
and instrumentation, pure and applied physics, material science and engineering, management, basic
science, biodiversity and environmental engineering, molecular biology and biotechnology, electrical
engineering, mechanical engineering, civil engineering, and electronics communication engineering.
Reviewing process of the ICCI-SEM-2017 was a challenging process that relies on the goodwill of
those people involved in the field. We invited more than 25 researchers from related fields to review
papers for presentation and the publication in the proceedings of ICCI-SEM-2017 proceedings. We
would like to thank all the reviewers for their time and effort in reviewing the documents. The
published papers have passed the process of improvement accommodating the discussion during the
conference as well as the reviewers comments who have guided any necessary improvement.

The papers that were presented on the two days formed the heart of the conference and provided
ample opportunity for discussion. The papers were split almost equally between the five main
conference tracks, i.e., track-I (Computer Science & Applications), track-II (Electronics and Electrical
Engineering), track-III (Materials and Mechanical Engineering), track-IV (Civil & Bio-Technical
Sciences) and track V (Basic Science & Management). There were plenary lectures covering different
areas of the conference.

Finally, we would like to thanks to all the proceeding team who have dedicated their constant support
and countless time to bring these scratches into a book. The ICCI-SEM-2017 proceeding is a credit to
a large group of people, and everyone should be proud of this outcome.


TABLE OF CONTENTS
Track I (Computer Science and Applications)

SL No. TITLES AND AUTHORS Page No.

01. Software Defect Prediction using Adaptive Neuro Fuzzy Inference System 01-06
M SatyaSrinivas, Dr. G Pradeepini, Dr. A Yesubabu

02. A Distributed Triangular Scalar Cluster Premier Selection Scheme for Enhanced Event 07-14
Coverage and Redundant Data Minimization in Wireless Multimedia Sensor Networks
SushreeBibhuprada B. Priyadarshini, SuvasiniPanigrahi, AmiyaBhusanBagjadab

03. A Novel Approach for Phishing Website Detection using Rule Mining and Machine 1519
LearningTechnique
BinalMasot,RiddhiKotak,MittalJoiser

04. Partial Shape Feature Fusion Using PSO-ACO Hybrid Method for Content Based 20-28
Image Retrieval
KirtiJain,Dr.SaritaSinghBhadauria

05. Secure and Scalable Transformation of medical Imaging Data in Cloud using 29-33
Customized Hospital based Management Systems
NandaGopalReddyA,RoheetBhatnagar

06. A Comparative Analysis of different techniques for triple level biometric authentication 34-39
for human
RohitSrivastava, Dr. PrateekSrivastava

07. Software Fault Detection using Fuzzy C-means and Support vector machine 40-43
HiraliAmrutiya, RiddhiKotak, Mittal Joiser

08. Eco-friendly polyester dyeing with Croton Oblongifolius 44-47


TruptiSutar, AshwiniPatil, Prof. (Dr.) R. V. Adivarekar

09. Dengue disease prediction using Weka data mining tool 48-59
KashishAraShakil, Samiya Khan, ShadmaAnis, MansafAlam

10. Texture Analysis Methods: A Brief Review 60-65


SoumyaRanjanNayak, RajalaxmiPadhy, Jibitesh Mishra

11. A Novel Approach to Detect and Prevent Wormhole attack in Wireless Networks 66-72
Sara Ali, Dr Krishna Mohan

12. A Preliminary Performance Evaluation of Machine Learning Algorithms for Software 73-81
Effort Estimation
PoonamRijwani, Sonal Jain

13. Edge Detection Method based on Cellular Automata 82-86


JyotiSwarup, S. Indu

14. Application of Elliptic Curve Cryptography for Mobile and Handheld devices 87-91
AjithkumarVyasarao, K Satyanarayan Reddy

15. A Comprehensive Study of Edge Detection Techniques in Image Processing 92-100


Applications using Particle Swarm Optimization Algorithm
M S Chelva, Dr. A K SAMAL

16. An application of hesitant fuzzy ideal techniques to the intra-regular and weakly- 101-107
regular po-semigroup
Dr. Mohammad YahyaAbbasi, AakifFairoozeTalee, Sabahat Ali Khan

17. Enhanced Symmetric Crypto-Biometric System using User Password: A Proposal 108-112
Pooja S, Arjun C.V

18. Shape Optimization of Microcantilever beam used as Biosensors using Resonance 113-118
Frequency Shift Method
Ayush Kumar, Murgayya.B.S, Sekhar.N

19. Penetration Testing: An Art of Information Gathering in an Ethical Way 119-125


Arjun C.V, Pooja S

20. Phishing: A Critical Review 126-131


ShwetaSankhwar, DhirendraPandey, R.A Khan

21. A glance of anti-phish techniques 132-143


ShwetaSankhwar, DhirendraPandey, R.A Khan

22. Generated Topics with the improved effects of Hyper-parameters in LDA 144-148
M. Trupthi, Suresh Pabboju, G. Narasimha

23. A Novel Approach to Quality Enhancement of Grayscale Image using Particle Swarm 149-156
Optimization
M S Chelva,Dr. S. V. Halse, Dr. A. K. Samal

Track II (Electronics and Electrical Engineering)

SL No. TITLES AND AUTHORS Page No.

24. An Application of Gergeoschrins Theorem for the voltage Stability study of Power 157-162
Systems based on Optimization Method
V.R.Patel, Dr.B.N.Suthar

25. Optimization of Time Taken in Burr Removal of CP Titanium (Ti) Grade 2 Using 163-169
Electrical Discharge Machine (EDM)
Harsh Hansda, Rahul Davis

26. Low-Complexity PTS-Based Schemes for PAPR Reduction in SFBC MIMO-OFDM 170-173
Systems
P.Ravikumar, P.V.Naganjaneyulu, K.Satyaprasad

27. An Application of TLBO algorithm for the Voltage Stability Improvement by MW- 174-178
Generation Rescheduling
V.R.Patel, Dr.B.N.Suthar

28. Skin Disease Classification using GLCM and FCM Clustering 179-188
PradeepMullangi,Y.SrinivasaRao,PushpaKotipalli

29. Control Strategy for Load Sharing of Paralleled Inverter in Islanded Mode 189-196
P C TejasviLaxmanBhattar, M. Kowsalya

30. Scrutinized Electronic Voting Machine 197-199


Neetu Singh, Naresh Kumar, Lalit Kumar, BhavneetKaur

31. A comparative study of harmonic elimination of cascade multilevel inverter with equal 200-205
dc sources using PSO and BFOA techniques
RupaliMohanty, GopinathSengupta, SudhansubhusanaPati

32. Effect of Volute Tongue Radius on the Performance of a Centrifugal Blower A 206-211
Numerical Study
Madhwesh N, AnudeepMallarapu, K. VasudevaKaranth, N. Yagnesh Sharma

33. Experimental Study of a Solar Air Heater for Performance Enhancement using Solar 212-218
Turbulator Fans
K.VasudevaKaranth, ManjunathM. S, Madhwesh N., N.Yagnesh Sharma

34. A new approach for investigation of multi machine stability in a power system 219-223
Meenakshi De, G.Das, K.K.Mandal

35. Development of an Intelligent Controller for Vehicle to Grid (V2G) System 224-230
HridayRanjan, Dr. Sindhu M R
36. Power Quality Enhancement in Plug-in Hybrid Electric Vehicle (PHEV) Systems 231-242
Renjith R, Dr. Sindhu M R

37. Design and Simulation of Power Converter for 1.2kW Nexa Fuel Cell Power Module 243-247
Saroj Kumar Mishra

38. Nonlinear Modeling and Simulation of Cross Coupled Oscillator 248-254


Rahul Bansal, SudiptaMajumdar

39. Energy Saving Prospective For Residential Home Through Energy Audit 255-261
G.Bhavana, D.V.V.RaviKiran, A.Veeranjaneyulu, E.R.P.Venkatesh, K.KalyanSagar

40. Comparative Analysis of Si and 4H SiC DMOSFETs 262-269


PrabhupadaSamal, VeeraVenkataSubrahmanya Kumar Bhajna

41. Review on implementation of Silicon-on-Insulator Technology for Slot Antennas 270-274


Dr N.S Raghava, KarteekViswanadha

42. Time-domain Formulation of Diffraction by a Dielectric Wedge 275-277


Vinod Kumar, Sanjay Soni, N. S. Raghava

43. Compact Slotted Meandered PIFA Antenna for Wireless Applications 278-282
AkhileshVerma, Dr. N. S Raghava

44. A Novel Group-Based Cryptosystem Based On Electromagnetic Rotor Machine 283-289


Ashish Kumar, N S Raghava

45. A review on Segmentation Techniques of Image Processing 290-293


JyostnaMayeeBehera,KirtiBhandari, KamalpreetKaur, SukantaBehera

46. Sarcasm Detection on Twitter by A Pattern-based Approach 294-300


Anu Sharma, SavleenKaur

47. Multiband Monopole Antenna With complimentary Split Ring Resonator for WLAN 301-305
and WIMAX Application
PravanjanaBehera, AjeetaKar, MonalisaSamal, SubhransuSekhar Panda, Durga
Prasad Mishra

48. A Novel SIW Corrugated H-plane horn antenna 306-308


Anil Kumar Nayak, Shashibhusan Panda, SaumendraBehera

Track III (Materials and Mechanical Engineering)

SL No. TITLES AND AUTHORS Page No.

49. Theoretical Investigation on Kinematic Modelling of a Multi Fingered Robotic Hand 309-315
Deepak RanjanBiswal, Pramod Kumar Parida, AlokRanjanBiswal, AbinashBibek
Dash, Niranjan Panda

50. Performance Enhancement of Aerospace Vehicle of Pulse Detonation Engine (PDE) 316-321
Phase - II
SubhashChander, Tejinder Kumar Jindal

51. Production of biodiesel from non-edible tree-borne oils and its fuel characterization 322329
NabnitPanigrahi,AmarKumarDas,KedarnathHota

52. A Review on the Production and Optimal use of Ethyl alcohol as a Surrogate fuel in IC 330336
Engines extracted from Organic Materials
AmarKumarDas,RiteshMohanty

53. A Review on Performance and Emission of Waste Plastic fuel on Compression Ignition 337-342
Engines
AmarKumarDas,NabnitPanigrahi

54. Numerical simulation of vapor compression refrigeration system using refrigerant 343-347
R152A, R404A and R600A
Ranendra Roy, MadhuSruthiEmani, Bijan Kumar Mandal

55. Biomass gasification: Issues related to gas cleaning and gas treatment 348-354
NavneetPathak, Dibyendu Roy, SudipGhosh

56. Effect of butanol addition to diesel on the ci engine characteristics: a review 355-361
Arindam Das, AmbarishDatta, Bijan Kumar Mandal

57. Effect of Mechanical Supercharger and Turbocharger on the Performance of Internal 362-369
Combustion Engine: A Review
Sikandar Kumar Prasad, Achin Kumar Chowdhuri

58. Development of refrigerants: a brief review 370-376


MadhuSruthiEmani, Ranendra Roy, Bijan Kumar Mandal

59. Effect of Particulate Laden Flow in an Axial Compressor Stage A CFX Approach 377-384
Shiva Prasad U, Suresh Kumar R., SatyaSandeep C, D. Govardhan

60. Computational Modeling of Erosion Wear due to Slurry Flow through a Standard Pipe 385-392
Bend: Effect of Bend Angle, Orientation, Diameter and Slurry Velocity
VikasKannojiya

61. A inverse kinematic solution of a 6-DOf industrial robot using ANN 393-397
Kshitishk.Dash, BibhutiB.Choudury, SukantaK.Senapati

62. Quantification of metal loss due to silt erosion under laboratory conditions 398-405
PragyanSenapati, Rakeshasharma K R, M. K. Padhy, U.K.Mohanty

63. Analytical modeling for non-linear vibration analysis of functionally graded plate 406-412
submerged in fluid
Shashanksoni, N. K. Jain, P. V. Joshi

64. Steady state solution of finite hydrostatic double-layered porous journal bearings with 413-420
tangential velocity slip including percolation effect of polar additives of coupled stress
fluids
Shitendu Some, Sisir Kumar Guha

65. Experimental investigation and prediction of tensile stress for SS 304 CP copper 421-427
dissimilar metal couple joint by pulsed wave TIG welding process
BikashRanjanMoharana, Alina Dash, Jukti Prasad Padhy

66. Experimental investigation and performance characterization of machining parameter 428-433


inAJM process using analytical method
Jukti Prasad Padhy, BikashRanjanMoharana, AyusmanNayak

67. Chaotic Directed Artificial Bee Colony (CD-ABC) Algorithm to solve 434-440
Tension/Compression Spring Design Problem
M. Rajeswari, T. Kalaipriyan, P. Sujatha, T. Vengattaraman, P. Dhavachelvan

68. An experimental study on some mechanical properties of epoxy/glass fiber hybrid 441-446
composites modified by 10 Wt% SiO2 micro particles
Alina Dash, BikashRanjanMoharana, AyusmanNayak

69. Design of propulsion module for sample return mission to 2010tk7(asteroid) 447-452
AthotaRathanBabu, Kiran Mohan, Shiva Prasad U., R Suresh Kumar, C H
SatyaSandeep, D Govardhan

70. Designing Configuration of Shell-and-tube Heat Exchangers using Particle Swarm 453-459
Optimization (PSO) Technique
Uttam Roy, MrinmoyMajumder

71. Performance Analysis of Plant Efficiency-Power for Shell-and-tube Heat Exchangers by 460-465
Genetic Algorithm
Uttam Roy, MrinmoyMajumder

72. Fabrication of Functionally Graded Composite Material using Powder Metallurgy 466-472
Route: An Overview
AravindTripathy, Saroj Kumar Sarangi, Rashmikant Panda

73. Evaluation of engineering properties of flexible pavements using plaxis software 473-480
B Suresh, N VenkatRao, G Srinath

Track IV (Civil and Bio-technical Sciences)

SL No. TITLES AND AUTHORS Page No.

74. Analyzing Most Suitable Mode of Transportation for Work Trips in "Y" Sized Cities in 481-485
Indian Context; using Analytic Hierarchy Process
JayeshJuremalani, Krupesh A. Chauhan

75. Effect of metakaolin on the enhancement of concrete strength 486-492


PratikshaSarangi, K. C. Panda, S. Jena

76. Enzyme-Assisted Isolation of Micro fibrillated Cellulose (MFC) from 493-497


SaccharumMunjaFibre and its Characterization
GirendraPalSingh,PallaviVishwasMadiwale,R.V.Adivarekar

77. Chemical modification of ancient natural dye for textile bulk dyeing 498502
AshitoshB.Pawar,GeetalMahajan,R.V.Adivarekar

78. Study of compressive strength of blended nanoflyashppc cement mortar 503-507


AnanyaPunyotoyaParida

79. Experimental Investigation Of Physical, Mechanical And Electrical Properties Of 508-515


Cement Mortar Using Nano Fly-Ash
AbhijitMangaraj

80. A numerical study on the effect of geometric variation on aneurysm 516-522


Jaydeep ChandraSahu, SomnathChakrabarti

81. Development of a Framework for VSM Attributes Using Interpretive Structural 523-530
Modeling
LakhanPatidar, Vimlesh Kumar Soni, Pradeep Kumar Soni

82. Impact of Fly Ash and Silpozz as Alternative Cementitious Material in Crusher Dust 531-537
Concrete
S. Jena, K. C. Panda, P. Sarangi

83. Free vibration analysis of laminated composite beams 538-542


Prachi Si, BidyadharBasa

84. Assessment Carbon Footprint of agriculture works land agricultural canal by 543-555
theoretical queuing (Siore 1_2_3_Kas Kas)
AdibGuardiolaMouhaffel, Carlos Martnez Dominguez, Ricardo Daz Martn,
AssaneSeck, WaguAhmadou, Francisco Javier Daz Prez,
,OustasseAbdoulayeSall, ,DdjibrilSall

85. An overview on Public Transport System 556-558


Patitpaban Panda, Prof. SudhansuSekhar Das

86. Parking Management: A Solution to the Urban Mobility 559-562


Patitpaban Panda, Prof. SudhansuSekhar Das

Track IV (Basic Science and Management)

SL No. TITLES AND AUTHORS Page No.

87. Demonetization: Causes, impacts and Govt. Initiatives in India 563-567


Dr. Prasann Kumar Das, Mr. Nirmal Kumar Routra

88. Building and nurturing high performance through communication a case study on 568-573
a.mnaik (larsen and toubro)
Bikram K. Rout, Pranati Mishra

89. Significant Role of Conventional Sampling Survey Techniques in Precision Testing of 574-582
Estimation Process
VishwaNathMaurya,AwadheshKumarMaurya,RamBilasMisra,P.K.Anderson

90. Big Five Personality Traits and Tourists Intention to Visit Green Hotels 583591
VivekKumarVerma,SumitKumar,BibhasChandra

91. Promotional effort in 21st century by insurance companies 592-599


Dr.BiswamohanDash,Prof.GopikantPanigrahy,Prof.NirmalKumarRoutra

92. Wants of the middle division family and their fiscal status 600-607
K. VijayaSekhar Reddy, B Rajesh, P. BinduMadhavi, Dr. Roopalatha

93. A DNA based Chaotic Image Fusion Encryption Scheme Using LEA 256 and SHA 608-620
256
ShradhaMohanty, AlkeshaShende, K Abhimanyu Kumar Patro,
BibhudendraAcharya

94. Green hrm for business sustainability 621-624


Dr.SasmitaNayak, Vikashita Mohanty

95. Green Marketing- Its application, scope and future in India 625-630
Vikashita Mohanty, Dr.Sasmita Nayak

96. Preparation and characterization of Chitosan/PVA polymeric film for its potential 631-637
application as wound dressing material
Santosh Biranje, Pallavi Madiwale , R. V. Adivarekar

97. A near Optimal solution for CSP in Optimization method 638-642


Asvany Tandabani, KalaiPriyanThirugnanasambandam, SujathaPothula

98. Dyeing of Nylon Fabric using Nanoemulsion 643-648


Ravindra D. Kale, Amit Pratap, Prerana B. Kane, Vikrant G. Gorade

99. Preparation of self-reinforced cellulose composite using microcrystalline cellulose 649-653


Ravindra D. Kale, Vikrant G. Gorade, SnehaBhor
100. Hybrid Scheduling for Impatient Clients with Reneging and Balking 654-661
R.Kavitha, S.Krishna Mohan Rao

101. An analysis on effects of Demonetization on Cooperative Banks 662-669


Nihar Ranjan Swain,Dr. Rashmita Sahoo

102. Temperature and Frequency Response of the Dielectric Parameters of Pb and Ti 670-679
modified BiFeO3 Ceramic Compound
Niranjan Panda, R.N.P. Choudhary

103. A Study on Occupational Stress Experienced by Public and Private BPO Employees in 680-687
Hyderabad City
Ch. Lakshmi Narahari, Dr. Kalpana Koneru

104. Career Path: A Tool to Support Talent Management 688-693


Rashi Mahato

105. A study on fluctuations in the commodity market with reference to Guntur City, A.P 694-701
Ch. Hymavathi, Dr. K. Kalpana

106. Enhancing Performance of Detonation Driven Shock Tube 702-704


Swagatika Acharya, Tatwa Prakash Pattsani

Software Defect Prediction using Adaptive Neuro Fuzzy Inference


System
M Satya Srinivas, G Pradeepini , A Yesubabu

Abstract: Software Defect Prediction is a major challenge in Software Development process, to


reduce the cost of software implementation. In this paper software defects are predicted using
Adaptive Neuro Fuzzy Inference System (ANFIS). Predicting Defective prone modules in software
industry greatly reduces the software development cost. Most of the researchers applied various
data mining techniques like Adaboost, Neural networks, Random Forest and support vector
machines for software defect prediction datasets downloaded from NASA repositories. These
datasets are imbalanced in nature. Using ANFIS, Initial Fuzzy Inference System (FIS) was derived
using Subtractive Clustering method and then FIS was trained using hybrid learning rule. The
performance of the classifier is measured in terms of AuC values for these imbalanced datasets.
We compared the results of ANFIS with cost sensitive neural networks. The Receiver operating
characteristics (ROC) curves are generated and presented in Result section. The ROC values of
ANFIS are found satisfactory compared to cost sensitive Neural networks.

Keywords: Adaptive Neuro Fuzzy Inference System, Receiver operating characteristics,


Software Defect Prediction, Subtractive Clustering, hybrid learning

1. Introduction defects depends on software metric, lines of


In the process of software code
development, predicting software defects (LOC) He derived an equation No. of defects
plays a major role. Predicting a software (N) =4.86+0.018*LOC. But LOC is not
defect in advance reduces the cost of enough to capture software complexity.
software development and improves the 2. Complexity metrics and fitting models: In
quality of software product. There are various 1976, McCube found cyclometric metrics for
approaches for software defect prediction. determining software complexity. Cyclometric

Paper Info: This paper applies Adaptive Neuro Fuzzy complexity of a program V (G) = E V + 2,
Inference System for Software Defect Prediction. here E is number of edges, V is number of

Author Info: M Satya Srinivas, Research Scholar K L vertices. In 1977, Halsted introduced Halsted
University, has 7 years of Teaching Experience. complexity measures, which reflects
G Pradeepini, Professor CSE Dept, K L University, implementation of algorithms in different
has an experience of 15 years. She published no of languages. He observed that the number of
papers in reputed national & International journals
defects depends on effort, which in turn
Dr. Yesubabu, Prof. & HOD CSE Dept, Sir CRR depends on difficulty and volume. Defects
College of Engineering, has an experience of 15 years.
He guided number of scholars and published various (D) =E2/3/3000. But the limitation of this
papers in international journals. model is it just fit the known data but not
validated for new entries.
1. Simple metric and defect estimation 3. Regression model: Shen et al.s empirical
model: According to Akiyama, Number of study showed that linear regression model
can be validated on actual new modules. He
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

1
found Mean magnitude of relative error optimizing connection weights in ANN. Model
(MRE) between actual and predicted number was applied to five publicly available
of defects as 0.48. Munson et al. Applied datasets from the NASA repository1. Jun
Discriminative analysis using Logistic Zheng et al, studied three cost sensitive
regression with Halsted and cyclometric algorithms for software defect prediction. The
complexity metrics and obtained accuracy of performances of the three algorithms are
92% evaluated by using four datasets from NASA
4. Just in time Prediction model: A large projects2. A comparison of soft computing
scale empirical study of just in time quality algorithms for software defect prediction is
assures 68% accuracy, 64% recall on 11 done by Ertruk3. E Erturk applied Adaptive
open source and commercial projects. The Neuro fuzzy inference system for software
limitation of JIT model is practical validation defect prediction. Rodriguez used datasets
difficult. from PROMISE repository and applied
5. Practical Models: Chidamber & Kemerer feature selection6 and genetic algorithms for
introduced CK metrics which are Object predicting defective modules8,5. Guo
oriented for software defect prediction. The suggested the use of Random forest for
metrics are weighted methods per class predicting software modules9. Decision tree
(WMC), Depth of inheritance tree (DIT), learners are used to predict the defect
Number of children (NOC), Coupling densities in SDP datasets12.
between objects (CBO) and Response for a 3. Adaptive Neuro Fuzzzy Inference
class (RFC) System (ANFIS)
6. History metrics prediction models: History 3.1 Generating Initial FIS
metrics do not extract particular program The problem with fuzzy inference
characteristics such as developer social system is identification of rules. Rules are
network, Component network and anti generated either by using grid partitioning or
pattern. It is not applicable for new projects subtractive clustering methods.
and projects lacking in historical data. 3.1.1 Grid Partitioning
7. Cross Project defect prediction: These Grid partitioning divides each input
models are applicable for new projects variables into sub intervals and forms the
lacking in historical data5. rule for each possible interval of input
The remainder of this paper is organized as variables. For the variables with continuous
follows. Section 2 discuss about related values, this method generates a huge
work. Section 3 discusses methodology. In collection of rules which over fits the data.
section 4 we discuss results and Hence this method is not suitable for the
comparisons datasets consisting of continuous variables.
2. Related Work 3.1.2 Subtractive Clustering
Faruk Arar et.al applied Artificial Subtractive clustering generates the
Neural Network (ANN) for constructing a fine tuned clusters for each input variable. A
model for Software Defect Prediction. He rule is generated for each cluster of input
applied Artificial Bee Colony (ABC) for variables. Subtractive clustering algorithm
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

2
considers each data point as a candidate for After reduction is done data point with
cluster centre. First, density measure for all highest density measure is selected as
data points is calculated. For xi it is defined cluster xc2. Next we use cluster xc2 for density
as reduction. This procedure is repeated until
n
( xi xj ) Dck < AD1.
Di = e 3.2 Training FIS
j 1 (ra / 2)
There are various methods that are
A data point with many neighbouring points
proposed to construct the model for software
has high density measure. ra is the range of
defect prediction. In this paper we are
data point that the minimum number of
proposing Adaptive Neuro Fuzzy Inference
neighbouring data points lies. Data point with
system for constructing the classifier.
highest density value is selected as first
3.2.1 Adaptive Fuzzy Inference System
cluster centre. rb is then used as radius
(ANFIS)
which defines the neighbourhood where
ANFIS is a 5 layered architecture which
density reduction is done.
derives Sugeno type Fuzzy Inference system
( xi xc1) using hybrid learning rule. Figure 1 shows
Di=Di Dc1 e
( rb / 2)
the architecture of ANFIS.

Fig 1: Adaptive Neuro Fuzzy Inference System

Layer 1 is an adaptive layer that computes Layer 4 is an adaptive layer that uses neural
the membership values of input variables. In networks for best fitting of consequent
this paper we are using Gaussian parameters(p,q,r) of node function
membership function. f(x,y)=p*x+q*y+r.
Layer 2 is a fixed layer that computes the Layer 5 is a fixed node that computes the
firing strength of input variables using overall sum of input signals.
product operator. 4 Results & Discussion
Layer 3 is a fixed layer that computes the Knowledge Extraction based on
normalized weight of the firing rule Evolutionary Learning (keel) is an open
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

3
source tool to conduct experiments on We implemented Adaptive Neuro
different datasets to evaluate the Fuzzy Inference system for Software defect
performance of various learning algorithms. prediction. In MAT lab, there is a toolbox
We conducted experiments on software Neuro Fuzzy tool box which implements
defect prediction using 4 datasets ANFIS. In the Data section, training data
downloaded from NASA data repository to was loaded. In FIS section, Initial Sugeno
identify defective prone modules. Neural Fuzzy Inference system was derived by
networks are one of the methodologies for using subtractive clustering method.
best fitting non linear relationships between Subtractive clustering method takes four
attributes. Cost sensitivity is applied to neural parameters Range of Influence (0.3),
networks to improve the performance of the Squash factor (1.25) Accept ratio (0.5) &
classifier. Reject Ratio (0.15). In training section, FIS
In Keel experimentation section, was trained using ANFIS algorithm. In
Cost Sensitive Neural Networks are testing section, FIS was tested with unseen
implemented using JAVA. For data. The Receiver Operating
experimentation, we imported 4 SDP Characteristics (ROC) was plotted against
datasets in Data Management section. In true positive rate with false positive rate.
the experimentation section, we included AuC values are determined for each dataset
dataset and Cost Sensitive Neural Network of Software defect prediction and results are
algorithm is imported and the results are compared with cost sensitive neural
showed using imbalance check methods. networks.
This algorithm is repeated with different The performance of the classifier is
parameter values. In the first iteration, measured in terms of Area under ROC Curve
numbers of layers are fixed to two with a (AuC) values for imbalanced datasets. In
total of 15 neurons. In the next iterations the Table 1 we compared the results of cost
numbers of neurons are increased to sensitive neural networks (considering
30 followed by 60. Finally the algorithm is different parameter values) with ANFIS.
tested by 3 layers with 90 neurons. These
results are presented in Table 1.
Table I: AuC values for Software Defect Prediction
Dataset\Al CS NN CS NN CS NN CS NN ANFIS
gorithm (l:2, n:15) (l:2,n:30) (l:2,n:60) (l:3,n=90)
kc1 0.5 0.5 0.5 0.5 0.8482
kc2 0.64958 0.66451 0.660324 0.534307 0.8998
cm1 0.5 0.5 0.5 0.5 0.8904
pc1 0.589767 0.548134 0.637584 0.561159 0.8416

For all datasets, the performance of against true positive rate. Figures 2-5
ANFIS found satisfactory. ROC curve are illustrates these ROC curves on various SDP
generated by plotting false positive rate datasets.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

4
Fig 2: ROC Curve cm1 dataset Fig 3: ROC Curve kc1 dataset

Fig 4: ROC Curve kc2 dataset Fig 5: ROC Curve pc1 dataset
5. Conclusion terms of AuC values and found satisfactory
There are various methods for predicting results with ANFIS.
defective prone modules using data mining References:
like decision trees, Random forests, Support
[1] Omer Faruk Arar, Kurat Ayan,Software
Vector Machines and Neural Networks. In defect prediction using cost-sensitive neural
network. Applied Soft computing, April 2015.
this paper, We proposed Adaptive Neuro
[2] Jun Zheng et.al. Cost-sensitive boosting
Fuzzy Inference System (ANFIS) for neural networks for software defect prediction.
identifying defective prone modules. ANFIS Expert Systems with Applications. 2010
June;37(6),45374543.
method generates sugeno fuzzy model. [3] Ezgi Erturk, Ebru Akcapinar Sezer,A
comparison of some soft computing methods or
Initially FIS was generated by subtractive
software fault prediction. Expert systems with
clustering method and it was trained by applications.2014
[4] J. Nam, S. J. Pan, and S. Kim. Transfer defect
ANFIS. We applied cost sensitive neural learning. In Proceedings of the 2013 International
networks on SDP datasets with different Conference on Software Engineering. ICSE 13,
382391.
parameters and the results are compared [5] Song Q, Jia Z, Shepperd M, Ying S, Lin J. A
with ANFIS. The performance is measured in general software defect proneness prediction
framework. IEEE Transations on Software
Engineering. 2011;37,356-370.
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

5
[6] S. Shivaji, E. Whitehead, R. Akella, and S. Empirical Software Engineering. 2013;19, 531-
Kim. Reducing features to improve code change- 564.
based bug prediction.Software Engineering. IEEE [11] http//mdp.ivv.nasa.gov. Software Defect
Transactions on Systems.2013; 39(4),552569. Prediction datasets, Date accessed 15/11/2015
[7] K O. Elish, M O. Elish. Predicting defect prone [12] Kanb P, Pinzger M, Bernstein. A predicting
software modules using support vector machines. defect densities in source code files with decision
Journal of systems & Softwares. 2008;649-660. tree learners. Proc. of MSR New York, ACM
[8] Rodriguez D, Herraiz I, Harrson R. On
press.2006;119-125.
software engineering repositories and their open
problems. In : First International workshop on [13] Shilpee Chaoli,Gil Tenne and Sanjay Bhatia.
realizing artificial Intelligence Synergies in Analysing Software Metrics for Accurate Dynamic
Software engineering(RAISE).2012. Defect Prediction Models. Indian Journal of
[9] Guo L, et al. Robust prediction of fault Science and Technology. 2015 Feb; 8(S4), 96-
proneness by Random forests. In: 15th 100
International symposium on software reliability [14] C Balakrishna Moorthy, Ankur Agrawal and M
engineering.ISSRE,IEEE;2004. K Deshmuh. Artificial Intelligence Techniques for
[10] Chen N, Hoi SCH,Xiao X. Software process Wind Power Prediction :A Case Study. Indian
evaluation: A machine learning framework with Journal of Science and Technology. 2015
application to defect management process. oct;8(25),1-10.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

6
A Distributed Triangular Scalar Cluster Premier Selection Scheme
for Enhanced Event Coverage and Redundant Data Minimization
in Wireless Multimedia Sensor Networks
[1] Sushree Bibhuprada B. Priyadarshini [2] Suvasini Panigrahi [3] Amiya Bhusan Bagjadab
[1][2][3] Veer Surendra Sai University of Technology

Abstract: This research reports a novel algorithm inspired by clustering paradigm for providing
improved event coverage, while actuating reduced number of cameras. The main objective of the
current proposal is concentrated on reducing the amount of redundant data transmitted due to
overlapping of field of views of cameras, while enhancing the occurring event area coverage. The
basic framework of the algorithm is divided into three phases. Initially, the monitored region is divided
into number of compartments. Afterwards, in each of the compartment three scalar cluster premiers
are selected effectively. Subsequently, whenever event takes place, these scalar cluster premiers
report their corresponding cameras regarding its occurrence and the cameras collectively decide
their order of actuation. The least camera activation, enhanced coverage ratio, reduced event-loss
ratio, improved field of view utilization, minimized redundancy ratio and decreased energy
expenditure for camera activation achieved from the investigation validate the efficacy of the
proposed approach.

Index Terms: Primitive cluster premier, Secondary cluster premier, Tertiary cluster premier,
Coverage ratio, Redundancy ratio, Active Camera Count.

I. INTRODUCTION known that battery constrained sensors are


In this modern era of present-day deployed in the regions to be monitored.
technology, sensors are used in almost all However, as the cameras consume higher
spheres of life. Their prevalent viability and amount of energy than the scalars, they are
adoration is essentially due to their exclusive normally kept in turned off state and they
advantages and versatility to be deployed in any undergo activation only when scalars present
herculean environment. Sensors find a lot of within their DOFs apprise them regarding the
applications like habitat tracking, environmental occurrence of any event.
monitoring, industrial diagnosis, agricultural
control, disaster relief, seismic activity The camera sensors possess two primitive
monitoring, and battle field monitoring, etc. In parameters namely, Field of View (FOV) and
most of the cases, sensors are deployed for Depth of Field (DOF). FOV is the angle at which
ensnaring the event a camera can trap the accurate image of an
object and DOF is the distance within which a
information in inaccessible remote areas. They camera can take the accurate image of an object
are scattered across huge regions of interest so [1]. A camera can either be a directional camera
as to detect and monitor the occurring event or an omni-directional camera based on the
information pertaining to the monitored region. FOV angle. If a camera takes image of an object
Hence, covering the event region intellectually along a particular direction, it is said to be a
and capturing the event information efficiently directional camera. On the contrary, if it takes
have been a predominant problem of image of an object uniformly along all the
consideration. directions, then it is said to be an
omni-directional camera. Further, a scene may
Wireless Sensor Networks (WSNs) are the be shot from several camera angles
networks, which consist of autonomous sensors simultaneously. The preference of employing
for monitoring the physical as well as omni-directional cameras is that they can
environmental conditions. Wireless Multimedia provide more panoramic photographs of
Sensor Networks (WMSNs) are extensions of occurring event information, thereby capturing
WSNs where scalar sensors are employed the concerned land marks and the habitats more
along with the cameras. Scalars are those appropriately and effectually over prolonged
sensors, which can capture the textual period of time than the directional cameras,
information and cameras can capture both audio which own fixed orientations. Basically,
as well as video information. Further, it is well redundant data transmission takes place due to

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

7
overlapping of FOVs of cameras. As we know, compartments. The scalar premiers are
more is the value of DOF of a camera; more selected intellectually in such a manner that the
area will be covered by the concerned camera. scalar having the lowest mean distance among
However, excess increase in values of DOFs, all the scalars in a compartment is chosen as the
leads to increased overlapping superimposed primitive cluster premier. The secondary cluster
zones among the FOVs of cameras. premier is the farthermost scalar, present at 60
counter-clockwise direction along the baseline
Several approaches have been devised till from the primitive cluster premier in the
now for minimization of redundant data. A corresponding compartment. Similarly, the
Distributed Collaborative Camera Actuation tertiary cluster premieris the scalar whose
based on Scalar Count (DCA-SC) is a recently average mean distance from both the other
proposed approach for minimizing the amount of cluster premiers is the smallest. The selection of
redundant data transmitted [1]. In this paper, the all the scalar premiers is realized in such a
cameras collaboratively decide which among manner that the cameras actuated by them
them are to be activated based on descending ensnare information regarding any kind of event
order of their scalar count (SC) values. SC value while covering more distinct portions of the
of a camera represents the number of event monitored region, thus, minimizing the amount
detecting scalars present within the FOV of the of overlapping among the FOVs of cameras,
concerned camera sensor. Similarly, another which form the ultimate objective of this
scheme namely, Distributed Collaborative research proposal.
Camera Actuation based on Sensing-Region
Management (DCCA-SM) actuates the cameras The rest of the paper is organized as
based on the amount of residual energy follows: Section II discusses the related work
contained by them [2]. However, this scheme done in the field. Section III elaborates the
suffers from a major drawback that the number proposed approach. Section IV details the
of cameras actuated cannot be significantly simulation frame work and result discussions.
minimized for reducing the amount of redundant Finally, in Section V, we conclude the paper.
data transmission. Such transmissions of
redundant data lead to unnecessary energy as II. RELATED WORK
well as power expenditure. Therefore, our goal is Significant amount of research work have
to actuate only the minimum number of cameras been carried out for providing better coverage of
in such a manner that the amount of redundant the monitored region. The DCA-SC [1] and
data transmission is minimized, while providing DCCA-SM [2] approaches as discussed in
improved coverage of event region to be Section I consider the activation of camera
monitored. sensors; while minimizing the amount of
redundant data transmitted. Similarly, the idea of
DCA-SC [1] and DCCASM [2], as discussed cover set that has been used in [3]; helps in
earlier, are two approaches that attempt to cover monitoring all the desired targets. The algorithm
the monitored region with less camera actuation. presented in this work divides the nodes into
However, redundancy in data transmission is still cover sets and generates maximum number of
there due to overlapping of FOVs of actuated cover sets. Besides, the approach presented in
cameras. Besides, complete elimination of [4] concentrates on the notion of directional
redundant data transmission is unavoidable coverage, where each of the individual targets is
since event information loss would occur if the associated with differentiated priorities.
redundancy causing cameras are kept in turned Moreover, the paper discusses the issue of
off state. Hence, our objective is to develop an priority-based target coverage and chooses a
optimal algorithm that activates significantly minimum subset of directional sensors that can
reduced number of cameras in such a way that it monitor all the targets, while propitiating their
minimizes the amount of redundant data prescribed priorities.
transmission while providing enhanced event
coverage as compared to DCA-SC [1] and The analysis of the coverage process
DCCA-SM [2]. induced on a one-dimensional path by a sensor
network is modeled as a two-dimensional
In current research, we have devised a Boolean model [5]. Furthermore, the path
novel approach called Distributed Triangular coverage measures such as breach, support,
Scalar Cluster Premier Selection (DT-SCPS) length to first sense and sensing continuity
that divides the entire monitored region into a measures such as holes as well as clumps are
number of compartments and selects three also characterized in the same work. In another
scalar premiers namely, primitive cluster approach [6], priority is given on redundant data
premier, secondary cluster premier and tertiary elimination, where a local elimination algorithm
cluster premier effectively in each of the

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

8
that removes the redundant messages locally in Where, penc: portion of event area not covered
each state of the automaton is proposed. by all the activated cameras and te: total area of
Similarly, a method is proffered in [7] for occurring event.
removing redundant messages in parallel
programs which has been distributed Definition 3: Redundancy Ratio is defined as
automatically. This proposed algorithm uses the ratio of total portion of overlapping area of
program control flow i.e., contains gotos. The FOVs of cameras belonging to occurring event
control flow is a finite deterministic automaton region to the total unique portions of event area
with a DAG of actions in every state. Additionally, that is covered by the cameras. Mathematically,
a data similarity based redundant data
elimination technique has been described in [8]. Redundancy ratio = pecof/tupec
In this paper, an algorithm is depicted to (3)
measure similarity between data collected
towards the base station such that an Where, pecof: portion of event area covered by
aggregator sensor sends minimum amount of overlapping of FOVs of cameras belonging to
information to the base station. occurring event region and tupec: total unique
portions of event area that is covered by all the
III. PROPOSED APPROACH activated cameras.
We have devised a novel approach called
Distributed Triangular Scalar Cluster Premier As the redundancy ratio goes on increasing,
Selection (DT-SCPS) scheme for actuating less accordingly, the amount of overlapping among
number of cameras that cover distinct portions FOVs also increases. Hence, reduced value of
of event region effectively while providing redundancy ratio is preferable for attaining
improved event coverage along with lowered minimized energy expenditure.
redundancy in data transmission.
Definition 4: Field of view Utilization can be
A. Relevant Definitions and Terms defined as the ratio of the portion of the area of
an occurring event that is covered by all actuated
Some of the relevant definitions and terms cameras with respect to total area of FOVs of all
used in our proposed approach are discussed as the actuated cameras [1].
follows:
More value of Field of View Utilization
Definition 1: Coverage Ratio can be defined as ensures that more redundancy can be
the portion of the area of an occurring event that eliminated. Mathematically,
is covered by all the actuated cameras with
respect to its total area [1].
Field of view Utilization = pec/tfac
(4)
More is the value of coverage ratio, more
effectively the event region is covered.
Mathematically, Where, pec: portion of event area covered by all
the activated cameras and tfac: total area of
Coverage ratio = pec/te FOVs of actuated cameras.
(1)
Some of the important terms devised in our
Where, pec: portion of event area covered by all proposed algorithm are as follows:
the activated cameras and te: total area of
occurring event (i) Compartmental Cluster Count (CCC): It is
the total number of event reporting scalar
Definition 2: Event-loss Ratio is the ratio of premiers present within the camera sensors
portion of area of event which is not covered by DOF which are present at the same
activated cameras to the total area of occurring compartment of the concerned camera sensors
event. location.

Less is the value of event-loss ratio; greater (ii) Non-Compartmental Cluster Count
will be the coverage of the occurring event (NCCC): It is the total number of event reporting
region. Mathematically, scalar premiers present within the camera
sensors DOF which are present at different
Event-loss ratio = penc/te (2) compartment (s) from the camera sensors
location.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

9
(iii) Total Cluster Count (TCC): It is the sum of (v) Event Detecting Cluster Premier (EDCP):
all the compartmental as well as EDCP is a table maintained by each camera
non-compartmental scalar premiers belonging sensor which contains the ids of event reporting
to a particular camera sensor. scalar premiers present within FOV as well as
DOF of concerned camera. Initially, EDCP is
(iv) Active Camera Count (ACC): It represents also initialized to 0.
the total number of cameras to be activated to
cover the prevailing event zone. After receipt of MCIM and MSIM, the
sensors estimate the Euclidean distance
B. Distributed Triangular Scalar Cluster
between each other. The Euclidean distance
Premier Selection (DT-SCPS) Method
between any two sensors Si(Xi, Yi), Sj(Xj, Yj) can
The entire framework of the proposed be represented mathematically as follows:
DT-SCPS algorithm runs through the following
three phases:
Dist (Si, Sj) =
1. (a) Phase 1: Initialization and Scalar Premier
(Xj - Xi) 2 (Yj - Yi) 2 (5)
Selection

Initially, all the scalar sensors and camera


sensors are randomly deployed. Scalars and Where, Dist (Si, Sj) represents the distance
cameras broadcast My Scalar Information between sensor Si and sensor Sj.
Message (MSIM) and My Camera Information
Message (MCIM) respectively. MSIM and MCIM The entire region to be monitored is
are the messages which contain the ids and assumed to be square in shape (During
location information of scalar and camera implementation, we have considered a 500
sensors respectively. These messages are 500 m2 region. Further, the concerned region is
broadcasted by the concerned sensors to the divided into square shaped compartments in
remaining sensors. Several data structures are such a manner that the length of each squares
maintained by the sensors for storing side (D) is equal to one tenth of the length of the
information regarding their id, location (X, Y) and area to be monitored (L).
occurring event information for deciding the
order of camera activation, which are listed as D =2 DOF = L (6)
follows: 10

(i) Waiting List (WL): WL retains the ids of all Where, D: Side length of each compartment
the cameras in ascending order. and L: Length of the monitored area (500 m)

(ii) Current Activation List (CAL): CAL The side length of each compartment is
contains the ids of all chosen as (2 DOF) so as to minimize the
the cameras which are to be activated according amount of overlapping among FOVs of actuated
to the prescribed order of actuation after the cameras, while selecting scalar premiers
occurrence of an event. At the beginning, CAL intellectually in each of the compartments for
list is initialized to 0. getting effective results. Since we have used
omni-directional cameras, thus, the diameter
(iii) Ordering List (OL): OL is retained by all the along which it captures image of any object is (2
cameras which contains the ids of only those DOF). Hence, the value of side length of each
cameras which cover the event region. Initially, compartment is taken as twice of DOF value so
OL is initialized to 0. After event occurrence, OL that reduced number of cameras will be
contains the corresponding ids of cameras in activated, while ensnaring larger amount of
descending order of their TCC values. unique area. In our context, DOF value is taken
as 25 m during implementation. Hence, (2
(iv) Current Basic Cluster Premier List DOF) value is 50 m, which is equal to (1/10) Th
(CBCPL) The CBCPL contains the ids of event of the length of the monitored region.
detecting scalar premiers present within the
DOF of a camera sensor which has been In each of the squared compartments, three
activated, while considering the actuation of any scalar cluster premiers are selected as
camera sensor. CBCPL is kept during the discussed in Section I which is portrayed in Fig.
running of the network and it is initialized to 0 at 1. Scalar Cluster Premiers or Scalar Premiers
the beginning. (SPs) are the scalars belonging to any of the
compartments of the monitored event region.
The SP acts as the chief representative of its

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

10
neighboring scalars, which represent a cluster
belonging to that particular compartment. A Fig. 1: Selection of scalar premiers in a
scalar premier called Primitive Cluster compartment in proposed DT-SCPS approach
Premier (PCP) is selected in each of the
compartment of the monitored region in such a Afterwards, a Tertiary Cluster Premier
way that it has lowest mean distance among all (TCP) is selected, which is the scalar premier
the scalars pertaining to that particular whose average mean distance from both PCP
compartment. The line joining the central point and SCP is the smallest among all the scalars
of the concerned compartment with the belonging to the concerned compartment. In this
coordinate position of PCP is chosen as the manner, three scalar cluster premiers are
Base line. Subsequently, the Secondary Cluster selected in each of the compartments. The set
Premier (SCP) is selected from the PCP such of sensors including the three selected scalar
that it is present at farthest distance from the cluster premiers, which belong to the same
PCP at an angle 60 along counterclockwise compartment are regarded as the
direction from the coordinate position of PCP Compartmental Members (CMs).
across
(b) Phase 2: Event Occurrence and
Addressing

At the beginning of this phase, an event


takes place and the scalar premiers (i.e. PCPs,
SCPs, and TCPs) present at the event region
detect the event on behalf of their scalar sensing
neighbors. Two sensors i and j are said to be
Sensing Neighbors, if their Euclidean distance
Dist (i, j) < 2RS [11], where RS is the sensing
range of the sensors.

the Base line.


Based on the scalar premiers from whom a values. Afterwards, their respective Total Cluster
camera receives the event information, the Count (TCC) values are updated accordingly.
cameras are categorized into two types: Layer Thus, the sum of all the compartmental and
Apprised Camera (LAC) and Non-Layer non-compartmental cluster members is
Apprised Camera (NLAC). LAC is the camera currently available with each of the cameras.
that is informed regarding the event information
from scalar premier(s) belonging to the same (c) Phase 3: Camera Collaboration and
layer (i.e. compartment). However, NLAC is the Actuation
camera that is informed regarding the event
information from scalar premier(s) belonging to In this phase, each camera broadcasts a
some other layer than the concerned camera message called Scalar Cluster Premier Count
sensor. Message (SCPCM) which contains their
respective TCC values and ids. Each of the
When event takes place, each scalar cameras now knows each others TCC values
premier reports its corresponding camera and updates their OL accordingly. However, the
regarding the occurring event by sending the condition is that ids of only those cameras are
Event Detect (ED) message. ED is a message added to Ordering List, which has positive TCC
that is sent by a scalar premier to a camera values. The camera from which no SCPCM is
sensor, when the scalar premier detects an received, its id is removed from the Waiting List
event. The ED message consists of the id, and the concerned camera is decided to be kept
location information of concerned scalar premier in turned off condition as the camera is not going
and the occurring event information. Moreover, to capture the occurring event information.
the corresponding camera represents the
camera within whose FOV the scalar premier The camera that comes first in the
lays. Subsequently, the cameras calculate their Ordering List is activated first. Subsequently, the
Compartmental Cluster Count (CCC) and activated camera broadcasts UCPM message to
Non-Compartmental Cluster Count (NCCC) rest of the cameras. This message contains the

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

11
ids of event detecting scalar premiers present
within the DOFs of the activated cameras. The
ids of scalar premiers maintained in UCPM are
then added to CBCPL. Afterwards, the camera
which comes next in the Ordering List i.e., the
camera having next highest TCC value
compares the ids of scalar premiers present in
UCPM with the ids of scalar premiers present in
its EDCP table. If the ids of scalar sensors of
both the cameras match completely, then the
camera is not activated. In case a mismatch is
noticed, then the concerned camera having next
highest TCC value undergoes activation. Its id is 2(a)
then immediately removed from the Waiting List
as well as Ordering List and it is added to
Current Activation List (CAL), which contains the
ids of activated cameras. Such matching and
mismatching process of SP ids continues till the
Ordering List becomes empty. At this point of
time, the number of cameras present in the CAL
gives the total number of activated cameras i.e.
Active Camera Count (ACC) value in the
proposed DT-SCPS approach.

2(b)
Fig. 2: Effect of varying number of cameras
IV. SIMULATION AND PERFORMANCE (noc) on
EVALUATION (a) Number of cameras activated (noca)
(b) Energy consumption for camera activation
In this section, we have developed a (ecca)
customized simulator written in C++ to evaluate
the performance of the proposed DT-SCPS We have varied the number of camera
approach. The performance evaluation of our sensors (noc) and observed its effect on number
proposed system has been carried out based on of cameras activated (noca) as shown in Fig.
the following assumptions: (i) all the sensors are 2(a). It is evident from the figure that with
randomly deployed, (ii) the sensing range of increase in noc the noca rises gradually in all the
scalars and FOVs of cameras are considered to cases. Further, since the noca is found to be the
be circular, (iii) the sensors are assumed to have minimum in case of proposed approach. Hence,
fixed positions. (iv )all the sensors are assumed the amount of energy consumption for camera
to be time synchronous. (v) all the messages are activation (ecca) is the least in our case as
assumed to be broadcasted sequentially. The shown in Fig. 2(b). Fig. 3(a) portrays the effect of
DOF value is taken as 25 m while varying the varying the noc on coverage ratio (cr) in case of
number of camera sensors and the sensing all the approaches.
range of scalars is taken as 10 m throughout the
implementation. We have varied the number of
cameras and observed their effect on the
number of cameras activated in case of DCA-SC
[1], DCCA-SM [2] and our proposed approach
DT-SCPS. The comparative performance
assessment is done on the basis of the following
performance metrics: (a) number of cameras
activated, (b) energy consumption for camera
activation. (c) coverage ratio, (d) event-loss
ratio, (d) redundancy Ratio, (e) field of View 3(a)
Utilization.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

12
maximum in our case, thereby ensuring less
amount of redundant data transmission in the
proposed approach.

V. CONCLUSION
This research paper presents a novel
algorithm called Distributed Triangular Scalar
Cluster Premier Selection (DT-SCPS) that
segregates the whole geographic region under
consideration into several compartments and
3(b) chooses three scalar premiers effectually in
Fig. 3: Effect of varying number of cameras each of the compartments in such a manner that
(noc) on the cameras actuated by them provide
(a) coverage ratio (cr) enhanced event area coverage along with
(b) event-loss ratio (elr) reduced redundant data transmission while
ensnaring information regarding the occurring
event. Experiments were carried out to evaluate
the efficacy of the proposed system DT-SCPS
while conducting comparative analysis with two
other parallel methods, namely, DCA-SC and
DCCA-SM. The experimentation was carried out
while varying the number of cameras and
observing their impact on several important
performance metrics. The investigation results
demonstrate the supremacy of DT-SCPS over
the other approaches with regard to reduced
camera activation, enhanced coverage ratio,
Fig. 4: Effect of varying number of cameras least event-loss ratio, minimized redundancy
(noc) on ratio, improved field of view utilization as well as
Redundancy ratio (rr) lowered energy expenditure for camera
activation.
REFERENCES

[1] Newell, A., Akka a, K.: Distributed collaborative


camera actuation for redundant data elimination
in wireless multimedia sensor networks, Ad Hoc
Networks, Elsevier, vol. 9, no. 4, pp. 514-527,
2011
[2] Luo, W., Lu, Q., Xiao, J.: Distributed Collaborative
Camera Actuation Scheme Based on
Sensing-Region Management for Wireless
Multimedia Sensor Networks, International Journal
Fig. 5: Effect of varying number of cameras of Distributed Sensor Networks, Hindawi
Publishing Corporation,
(noc) on
2012,ArticleID:486163,14pages
field of view utilization (fovu) [3] Zorbas, D., Gl nos, D., Kotzanikolaou, P.,
Douligeris, C.: Solving coverage problems in
It is noticed that with increase in noc the value of wireless sensor networks using cover sets, Ad
cr rises in all the cases and is found to be the Hoc Networks, Elsevier Science publishers , vol. 8,
maximal in case of proposed DT-SCPS, no. 4, pp. 400-415, June 2010
affirming more distinct event area coverage than [4] Wang, J., Niu, C., Shen, R.: Priorit -based target
the other approaches. The effect of variation of coverage in directional sensor networks using
noc on event-loss ratio (elr) is represented in genetic algorithm, Computers and Mathematics
with Applications, Elsevier, vol. 57, no.11-12, pp.
Fig. 3(b). Since the cr is found to be the 1915-1922, June 2009
maximum in case of DT-SCPS, hence [5] Ram, S.S., Manjunath, D., I er, S.K.,
information loss minimization is achieved by our Yogeshwaran, D.: On the path coverage
approach. Further, it is observed that with properties of random sensor networks, IEEE
increase in noc the values of rr rises in all the Transactions on Mobile Computing, vol. 6, no. 5,
approaches as shown in Fig. 4 and it is observed pp. 494-506,
to be the minimum in DT-SCPS. In addition, Fig. May 2007
5 shows the effect of varying noc on field of view [6] Cai, Y., Lou, ., Li, M., Li, X.Y.: Target-Oriented
Scheduling in Directional Sensor Networks, Proc.
utilization (fovu), which is found to be the 26th IEEE International Conference on Computer

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

13
Communication, IEEE INFOCOM 2007, Barcelona,
6-12 May 2007, pp. 1550-1558
[7] Girault, A.: Elimination of redundant messages
with a two-pass static analysis algorithm, Parallel
Computing, Elsevier, vol.28, no. 3, pp.433-453, 2002
[8] Ghaddar, A., Razafindralambo, T., Tawbi, S.,
Simplot-Ryl, I.: Algorithm for data similarity
measurements to reduce data redundanc in wireless
sensor networks, 2010 IEEE International
Symposium on a World of Wireless, Mobile and
Multimedia Networks (WoWMoM) , ISBN:
978-1-4244-7264-2, Montreal, QC, Canada, pp. 1-6,
14-17 june, 2010

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

14
A Novel Approach for Phishing Website Detection using Rule
Mining and Machine Learning Technique
[1]
Binal Masot , [2] Riddhi Kotak, [3]Mittal Joiser
[1]
M.E Student,[2] Assistant Professor, [3] Assistant Professor

Abstract: In last few years, phishing is a major problem of web or internet because the internet has
become a crucial part of our daily life activity like reading a newspaper, online shopping, online
payment etc. Hence internet users may be unsafe to typical types of web attacks which may induce
loss of the financial, personal information, brand name reputation customer trust from online
transaction. There for the phishing detection necessary. There is no conclusive solution to detect
phishing. In this paper we present main two core parts 1) To details investigation on phishing
circumstance and 2) proposed spearhead framework to detect phishing attack. Our proposed
framework work on combine algorithm of rule mining and machine learning. In this first rule mining
algorithm is applied after the result of it machine learning algorithm is applied so we can get better
accuracy.

Index Terms: Data mining, feature extraction, legitimate, machine learning, phishing

I. INTRODUCTION to till now according to RSA online fraud report


Now a day the most profitable fraud is identity [2, 7, 8, 9].
thievery means that to take users personal Table I Evaluation of Phishing during
1996-2016
information. The word phishing is derived from
Year Occurrence
the word fishing + phreaking, fishing means
use bait to induce the target and phreaking. The 1996 Phishing word first used
word phishing was first used in 1996 over the
internet by a group of hackers who stole America 1997 Declared a new threat called
online (AOL) accounts. By tricking unaware AOL Phishing
users into disclosing their passwords [1] The
main aim of phishing attack is to steal private 1998 Starting medium of attackers was
delicate information such as usernames, message and newsgroups.
passwords, credit card details, confidential
information, bank information, employment 1999 Using the Email system for the
details, financial record, and electricity bills and phishing attack.
so on. Last few years phishing quickly spread
2000 Phishers used key loggers type
posing a real threat to universal security.
Website phishing refers to the form of web attack for getting login details
threat that indirectly gets information of victim
2001 Used URL to direct user to making
like personal data, credential information.
Phisher will create a replica of legitimate website a fake site
so user cannot identify directly. The different 2002 Used screen loggers attack
techniques of Phishing by sending email of fake
site URL hyperlink, instant message, website 2003 Used IM and IRC
and SMS. 2004 Evolvement of pharming
In this paper, include overview of phishing
attacks, set of features used for detection of 2005 First used spear phishing word
phishing, performance metrics to find accuracy. 2006 First phishing attack over VoIP
We also provide proposed solution that can
detect phishing attacks. 2007 Become phishing scams more
than $3 billion

II. NARRATIVE 2008 Increased 39.8% than previous

The history of the phishing start from the 1996, year


day by day rate of the attack is increase. Table 1 2009 SHS blocked phishing attacks
shows growth rate of phishing starts from 1996
Impersonating 1079 different

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
15
organizations spoofed website, e.g, ask user to update some
sensitive information urgently by clicking on
2010 Facebook attracted more phishing
some malicious link. Another example is linked
attacks compare to Google and IRS with phishy URL instead of legitimate, e.g.,
2011 Web Hacking Incident www.faceb00k.com.
Step 3 Break-in:- As soon as the victim open
Database(WHID) fraud link, a malware is installed on the system
2012 Identified 6 million unique malware which allows the attacker to intrude the system
and change it configuration or access rights.
sample Step 4 Data collection:-Once the attackers
2013 69 Countries scam over Red get access to the victim system, the required
data and account detail are extracted. Phisher
October Operation
use rootkits to hide their malwares.
2014 Used of IOT 7,50,000 malicious Step 5 Break-out:-After getting the required
information the phisher remove all the link and
emails sent
website. It is also observed that they track the
2015 Spear phishing reached degree of success of their attack for refining
2016 Unsolicited emails containing future attack.

malicious attachment IV. LITERATURE REVIEW


B.B.Gupta et al.[1] propose the survey on
III. PERIOD OF EXISTENCE fighting against phishing attack. They give the
Any attack has some period to existence. various challenges and available solution. Jeeva
Phishing attack has some step or life cycle to et al.[3] propose an approach is based on the
attack on user. The following stages are involved association rule mining to detect phishing URL.
in phishing life cycle as shown in Fig1. These approaches in two phase in first phase
Step 1 Analysis and Environment they search URL and in second phase they
setup:-This is the first step or initialization step extract the features. The result shows that the
of the phishing. In this step the attackers proposed method achieved overall 93%
analyzed the organization and which types of accuracy.
network its used. Then set the environment. Ramesh et al.[5] proposed an anti-phishing
e.g., make a technique using target domain identification in
this they take a groups the domain from
hyperlinks having direct or indirect association
with the given suspicious webpage. The result
show that the proposed method achieved
99.65% accuracy on google.com search
engine, 99.6% on aol.com search engine,
99.55% on hotbot.com search engine,
99.45% on bing.com search engine.
Mahmoud et al. [11] proposed the technique
of phishing detection to reduce the rate of
false positive ratio. The main aim of this paper
is to extract the domain name from the victim
URL and compare the page rank of this
extracted domain name with actual domain
name. if not same then domain name will be
reported as phishing.
Shrestha et al. [10] proposed a multi label
feature classification algorithm to classify
whether a website is phishing or legitimate. In
Fig 1 Phishing Life Cycle this text based feature used to implementation
extracts visual feature from the screenshot of a
replica of legitimate website, which may redirect phishing website and text from its html source
the victim to some fraud web page. code. This technique is 30 times faster than
Step 2 Phishing:- After successful of setup the existing state of the art system in phishing
next step is send to the fraud mail or link a website classification problem.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
16
V. PERFORMANCE EVALUATION MATRICES VI. FEATURES USED FOR IDENTIFICATION
The main aim of most classifiers is to perform OF PHISHING WEBSITE
binary classification, i.e., phishing or legitimate. The importance of features is to help the
There are main four possibility exits to find the algorithm to give an accurate result. Toolan and
performance. These four possibilities are True Carthy [6] studied the utility of about 40 such
Positive, True Negative, False Positive and features we have categorized URL features
False Negative. used for detection of phishing website as follow:
Assume that NH denotes the total number of IP address
ham email and NP denotes the total number of In general the legitimate site has a domain
phishing email. If (nhH) denotes ham name. If the presence of the IP address in the
message, then (npH) denotes phishing emails URL instead of using the domain name of the
classified as ham (nhP) denotes ham mails website that indicate someone is trying to
classified as phishing and (npP) denotes access your personal information. An IP address
phishing emails classified as phishing. The is like http://
evaluation metrics used in this case are [15, 16]: 91.121.10.211/~chems/websce/verify.
1) True Positive (TP):- Ratio of the number of Sometime an IP address is transfer into
phishing website is identified correctly as: hexadecimal like http://0x58.0xCC.0xCA.0x62.
Rule:-
TP= If (IP address exists in URL then)
phishing
2) True Negative (TN): Ratio of the number of Else non-phishing
ham website identified correctly as:
Length of URL
TN= URL of the website consist three element
network protocol, host name and path. For a
3) False positive (FP): Ratio of the number of
given URL extracted the total length of the URL.
ham website classified as phishing, as:
If the length of URL is greater than 40 characters
then the site is phishing otherwise legitimate. i.e.
FP= http:// face book
4) False negative (FN): Ratio denoting the .com.bugs3.com/login/Secured_Relogin/index1.
number of phishing website classified as ham, html
as: Rule:-
If (host name)> 40 character phishing
FN= Else non-phishing

5) Precision (P): Measures the rate of phishing Number of dots in URL


website which are identified correctly as the This feature verifies the presence of the dot in
website detected as phishing: host name of the URL. Phishing site usually puts
extra dots in URL to make users believe that
P= they are legit page. i.e.,
http://www.Facebook.pcriot .com/ login.php.
6) Recall (R): Measures the rate of phishing Rule:-
website which are identified correctly as existing If (Number of dots)> 4 phishing
phishing website: Else non-phishing

R= Number of suspicious URL


7) F1 Score: This is the harmonic mean of @, _, -- is the suspicious characters; if in URL
Precision and Recall: suspicious character present then that website is
phishing. The @ symbol leads the browser to
F1= ignore everything suffix it and redirects the user
to the link typed after @ symbol. i.e.,
8) Accuracy (ACC): Measures overall correctly http://faceebook-com.bugs3.com/login/Secured
identified website: _Re-login/index1.html.
Rule:-
ACC= If (URL has suspicious) phishing
Else non-phishing

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
17
phishing, website is replica of trusted website
Number of slashes in URL users into revealing sensitive information. There
Additional slashes in URL such a technique to are several techniques to detect phishing. All
make a mimic URL look legitimate. If the URL Applied techniques contain mixture of features
contains 5 or more than 5 then the site is like content based, lexical based, body based
phishing. i.e. http://faceebook-com. bugs3. Com and so on. In our proposed system only used the
/login /Secured_Re-login/index1.html. URL based features. Benefit is used to URL
Rule:- features is if we used content based or body
If (slash in URL)>=5 phishing based we classify the whole source code of the
Else non-phishing webpage so its time consuming.

WHOIS lookup
WHOIS is a protocol which used to fetch the Fig 2 proposed frame work
customer detail of the registered website from In this proposed system dataset is taken from
the database. Legitimate website always stored the different data source. In our system the
in WHOIS data base. dataset is a mixing of phishing and legitimate
Rule:- URL. For phishing data we collect from the Phish
If (not in WHOIS database) phishing Tank API data source and for non-phishing data
Else non-phishing we collect from the Alexa Database.
Our system works on combination of rule
Length of host name in URL mining [3] and Machine Learning [4] algorithm.
URL string consist three element network First using if-else mining to classify the URL in
protocol, host name and path. For a given URL three form phishing, legitimate and suspicious
extracted the length of the host name. If the Then take the suspicious URL and applied the
length of host name is greater than 25 Machine Learning algorithm to classify the
characters then the site is phishing otherwise Suspicious URL is phishing or legitimate. So
legitimate. overall we classify the all the URL in two form
Rule:- phishing and legitimate.
If (host name)> 25 character phishing
Else non-phishing
VIII. CONCLUSION
Age of domain This research presents details of phishing
It can be extracted from WHOIS database. A attack. For phishing detection we analyzed the
PHP script was created to connect to WHOIS URL features using the if-else rules it is hybrid
database. If the domain age is less than one with machine learning technique to solve the
year then it classified as a phishing, else if the suspicious URL problem. Analyzed features are
domain age is more than one and less than 2 more sensible to phishing detection URL.so our
year then it classified as suspicious, else it is proposed work easily finds the phishing website
legitimate. and if find the phishing URL then its puts in
Rule:- blacklist automatically prevent.
If (age of domain) <1 year phishing
Else if (age of domain) <2 year suspicious REFERENCES
Else non-phishing
[1] B. B. Gupta, Aakanksha Tewari ,Ankit Kumar Jain
Unicode in URL Dharma P. Agrawal, Fighting against phishing
In URL consist the unique number for every attacks: state of the art and future challenges
Neural Computing and Applications
character. i.e. http://www.paypa1.com. In this (2016)springer ,pp. 126, 2016.
URL 1 is represent the l [2] The Phishing Guide Understanding & Preventing
Rule:- Phishing Attacks By: Gunter Ollmann, Director of
If Unicode phishing Security Strategy, IBM Internet Security
Else non-phishing Systems, 2007.
[3] Jeeva, Rajsingh , Intelligent phishing url detection
using association rule mining Human-Centric
VII. PROPOSED WORK Computing Information Sciences (2016)
The sources of phishing attacks are mostly springer, pp. 1-19 , 2016.
from email, websites and malware. The links [4] Huajun Huang; Liang Qian; Yaojun Wang A SVM
(URL) provided in phishing emails draws user based technique to detect Phishing URLs ,
Information Technology Journal;2012, Vol.
into entering phishing website. In website based 11(7), pp.921-925.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
18
[5] Ramesh Gowtham, k. Sampath Sree Kumar, classification mining techniques with
Ilango Krishnamurthi, An efficacious method for experimental case studies Seventh international
detecting phishing webpage through Target conference on information technology. IEEE
Domain Identification Decision Support Conference, Las Vegas, Nevada, USA, 2010, pp
Systems (2014) Elsevier , vol.61, pp.1222, 176181.
2014. [15] Husna H, Phithakkitnukoon S, Palla S, Dantu R
[6] Dhamija R, Tygar JD, Hearst MA (2006) Why (2008) Behavior analysis of spam botnets. In:
phishing works, in proceedings of the 2006 Communication systems software and
conference on human factors in computing middleware and workshops, 2008. COMSWARE
systems (CHI). ACM, Montreal, Quebec, 2008. 3rd International Conference, Bangalore,
Canada, pp 581590. India, 2008, pp 246 253.
[7] Anti-Phishing Working Group (APWG) (2014) [16]. Toolan F, Carthy J (2009) Phishing detection
Phishing activity trends reportfirst quarter using classifier ensembles. In: eCrime
2014. researchers summit, IEEE conference Tacoma,
http://antiphishing.org/reports/apwgtrendsreport WA, USA, 2009, pp 19.
q12014.pdf. Accessed Sept 2014.
[8] Anti-Phishing Working Group (APWG) (2014)
Phishing activity trends reportfourth quarter
2013.
http://antiphishing.org/reports/apwgtrendsreport
q42013.pdf. Accessed Sept 2014.
[9] Anti-Phishing Working Group (APWG) (2014)
Phishing activity trends reportsecond quarter
2013.
http://antiphishing.org/reports/apwgtrendsreport
q22013.pdf. Accessed Sept 2014.

[10] Niju Shrestha, rajan kumar kharel, Jason britt,


ragib hasan, High Performance classification of
phishin URLs using a multimodel Approach with
MapReduce, 2015 IEEE world congress on date
of conference, pp. 206-212
[11]Mahmovd Khonji, Andrew Jones, Youssef Iragi,
A novel Phishing Classification based on URL
Features, 2011 IEEE GCC Conference and
exihibition(GCC),Dubai,United Arab Emirates,
pp.19-22.
[12] Abdelhamid N, Ayesh A, Thabtah F (2014)
Phishing detection based associative
classification data mining Science- Direct ,
pp.59485959.
[13] Agrawal R, Imielinski T, Swami A (1993) Mining
association rules between sets of items in large
databases ACMSIGMOD, pp.207216.
[14] Aburrous M, Hossain MA, Dahal K, Thabtah F
(2010) Predicting phishing websites using

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
19
Partial Shape Feature Fusion Using PSO-ACO Hybrid Method for Content
Based Image Retrieval
Kirti Jain1, Dr.Sarita Singh Bhadauria2
1(Computer Science &Engg Dept., LNCT BHOPAL, INDIA)
2(Electronics Dept., M.I.T.S. GWALIOR, INDIA)

Abstract: In this paper proposed novel method of partial feature fusion using PSO-ACO hybrid
method for content based image retrieval. The partial feature fusion is combination of two or more
partial feature extractor. For the combination of partial feature extractor used geometrical invariant
function and some other function based on derivate of transform. The hybrid of PSO-ACO used
for the process of feature fusion. The process of feature fusion act in two modes one is local mode
and other is global mode. The local mode used ACO algorithm and in global mode used PSO
algorithm. The local mode of feature selection set the fitness constraints for the selection of
feature in two different feature extractor value of feature fusion. The global mode of features
selection iterates the process of most common dominated feature equivalent to input image and
precede the process of features fusion. The process of feature fusion incorporates with similarity
measure and enhanced the capacity of content based image retrieval. For the validation and
performance evaluations of proposed method used MALAB software and coral image dataset.
The values of precision and recall are enhanced instead of individual partial feature based content
based image retrieval.

Keywords: CBIR, Fusion, Partial Feature, PSO, ACO, Fourier Descriptor

INTRODUCTION descriptor one is Fourier feature descriptor and


other is partial feature descriptor. The Fourier
The current decade of multimedia data faced a feature descriptor finds the shape feature of
problem of accurate search and retrieval. The image. The shape feature of image estimated in
diverse features of multimedia data required terms of scaling rotating and transformation.
proper feature extraction and selection process. The other feature descriptor is called partial
The partial shape feature is most dominated feature extractor; this feature extractor used
feature of digital multimedia data. in this paper geometrical transform function for the extraction
proposed feature fusion based image retrieval of features. The geometrical invariants functions
technique. The partial shape features fusion is estimate the shape features in terms of odd and
new approach for content based image even feature process of matrix. The rest of paper
retrieval[1,2,5]. The partial shape features used discuss as in section II. Discuss the Feature
geometrical invariant function and other extraction process. In section III, Discuss futures
boundary and counter based method for fusion process. In section In section IV
extraction process. The mapping and matching Experimental Result and finally discuss
of partial shape features is very difficult due to conclusion & future work in section V.
irregular behaviors and shape of image objects.
For the better mapping and matching of partial .II. Feature Extraction
features used fusion technique. The feature
fusion technique estimate the correct feature set Feature extraction is primary stage of content
for the mapping and matching purpose for based image retrieval. For the extraction of
content based image retrieval. For the process features various features descriptor is proposed
of feature fusion used ACO-PSO hybrid swarm by various authors such as color features
intelligence algorithm [22,23]. The ACO-PSO descriptor, texture features descriptor and shape
algorithm searches the most common feature of feature descriptor. In this paper used shape
query image and database image and reduces feature descriptor. The behaviors of shape
the gap of retrieval. The combination of ACO- feature are very complicated due to irregular
PSO work as local and global search space shape of boundary and edges. For the extraction
technique [19,20,21], in local search space the of shape features used geometrical transform
ACO algorithm is work and in global search function such as Fourier features descriptor
space used particle swarm optimization. The [27,28]. In Fourier feature descriptor used
fitness constraints of ACO set the condition of Contour Fourier descriptor. The Contour Fourier
feature selection of the process of fusion. The features descriptor estimate all features
fusion of feature used two well knows features component such as positive and negative. The
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
20
other features descriptor is called the partial
feature descriptor. The partial feature descriptor
used the geometrical function for the estimation
of features such as sine, cosine and tangent, the
estimate feature calculated in terms of even and
odd process. Both feature descriptors describe
in flow chat process.

Fourier descriptors (FD)

The Fourier descriptor finds the shape of image.


The processing of Fourier transform function
generates the boundary value of image. The Figure 1, 2, 3 and 4 shows that rectangle
value of image represents in terms of lower image and divide into two sections
component of frequency [11]. The Contour individually each image.
Fourier technique makes the Fourier change 1: estimate the component value Xc and
straightforwardly for the mind boggling Yc[31,32]
coordinate capacity of the question limit. In this 1 and1 from figure 1,
technique, the descriptors are taken both 2 2 from figure 2
positive and negative recurrence hub. The 3 3 from figure 3
scaling of the descriptors is made by separating 4 4 from figure 4 as
the supreme estimations of the chose
descriptors by the total estimation of the main 2: After getting a value of 1 , 1 , 2 , 2 ,
non-zero part[13,14] 3 , 3 , 4 and 4

H12 = =1 + =1 ,


H22 = =1 + =1 ,


H32 = =1 + =1 ,


H42 = +
=1 =1

3: Calculate the value of H1, H2, H3 and H4


apply sine, cosine and tangent function for
rectangle of boundary[31,32]

4: Sin = 1 / H1 =2 / H2 =3 / H3 = 4 /
H4

Cosine = 1 / H1 = 2 / H2 = 3 / H3
=4 / H4

Tangent = 1 / 1 = 2 / 2 = 3 / 3 =
4 / 4

Figure 1 shows that the process of Fourier 5: After getting of , and ,


feature descriptor for the extraction of find three consecutive matrix of shape
features. features

Partial features descriptor (PF) 6: the all value of feature creates partial
features matrix.
1. Partial feature extraction process.

III. Feature Fusion

The feature fusion process used hybrid swam


intelligence algorithm. The hybrid swarm
intelligence algorithm is combination of ant
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
21
colony optimization and particle of swarm called the Gbest. The value of optimization in
optimization. The both swarm algorithm is local phase Pbest in set assigned to Global best
memory based and gives better optimization Gbest[19]. The process of algorithm proceeds in
results instead of other swarm based algorithm.
terms of population and controlled iteration. The
The combination of swarm algorithm creates the
dual search space for the fusion of feature. The movement of each particle is coordinated by a
fusion of feature process gives the most velocity which has both magnitude and direction.
dominated feature of two different feature Each particle position at any instance of time is
extractor. For the extraction of feature used two influenced by its best position and the position of
features descriptor one is Fourier feature the best particle in a problem space. The
descriptor and other is partial feature descriptor. performance of a particle is measured by a
The processing of ACO and PSO is combined in
fitness value, which is problem specific.
terms of local features set and global features
set. The processing of features done by particle
ACO-PSO Feature Fusion
swarm optimization and the verification of
constraints function done by ant colony
optimization. The processing of ant colony
The following parameter is used for the process
optimization gives the value of Gbest and final
of features fusion methods, x1, x2,..,xn
fused features for the content based image
is the features component of extracted feature of
retrieval. in this section discuss PSO, ACO and
two features descriptor. W is the Wight factor for
Fusion of ACO-PSO.
the sum of features, is the value of
Ant colony optimization (ACO) pheromones of ants, v1 and v2 is velocity of
particle agents, c1 and c2 is constants value of
The ant colony optimization algorithm is particle. The process of fusion step given below.
proposed by Dorigo and scholar. The proposed Step1. Define the value of features set
algorithm inspired by the behavior of biological S1{x1,x2,.xn} with population random
ants. The algorithm support the dynamic population of PSO.
population based process[17]. The working
principle of ant colony optimization is theory of a. Assign the velocity of particle
V1=0,V=0 and W=0
continuity and shortest path estimation. Ants are
b. Fitness constrains function for the
creepy crawlies which live respectively. Since selection of ants
they are visually impaired creatures, they nd the
most limited way from home to sustenance with () =
()
the guide of pheromone. The pheromone is the , (1, 2 . . ) . . (1)

concoction material kept by ants, which serves
as basic correspondence media among ants, in Here Ffd is fourier descriptor and Fpf
this way directing the assurance of the following is partial feature descriptor and w is
development. Then again, ants nd the briefest set of feature component of sum sets
way in view of power of pheromone saved on The selected features components set
various ways. By and large, power of pheromone the value of ants = {1.. } .
these ants value proceed for the
and the length of the way are utilized to reenact
estimation of local best , the local best
insect framework. The ant colony optimization function define as
used here for the selection of local features
process during the fusion process.
( ) ( )
(2)
Particle swarm optimization (PSO) =
( ) ( )
Particle swarm optimization algorithm inspired { 0
by the concept of birds fork. The property of Here is phenomenon value of ants
birds fork is fly in the sky with constant and and LI is value of least interface of ants.
optimal velocity and cannot drop in ground. This
biological property derived in from of algorithm. Step2. The Pbest value set to Gbest
the particle swarm optimization work in two
Input the feature fusion state of Gbest Value
phase local phase and global phase. The local
phase is called Pbest and the Global phase is
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
22
1. Calculate the value of relative feature many more objects. The all image format is jpeg
set in Gbest set and the dimension of image is 256*384. The
LSI
Rf = Here Lsi is interference value format of image is jpg. The mode of image is
Wd
of ants and Wd is sum value of PSO RGB. Here used two standard parameter
space. precision and recall[31,30]
2. The PSO space creates the fusion state
for the processing of feature. Precision=
numberofrelevantimagesretrieved
. (1)
max () () numberofimagesretrieved
= {
=1:( )

numberofrelevantimagesretrieved
0 Recall== (2)
numberofrelevantimagesindatabase
(3)
3. create the relative FS difference value
n m

Rd = (xi fs) . . (4)


fd=1 pf=1
4. if the value of Rd is zero the feature
fusion process is done.
5. Else the process of fusion goes into
steps 2
6. .it features fusion state is empty the
process of fusion is terminated.
7. Measure the similarity of features fusion
state features P1 and P
8. = |1 2|2
9. If value of sim~~ 0
10. Image is retrieve
Figure 3: The above figure shows the input query
and retrieval result image which is based on the
top of Beach is query image and result of image
shows in bottom by Fourier descriptors based
image retrieval. This image shows the combine
value of performance parameter such as
precision and recall, the both parameters are
calculated on the basis of their used
algorithm/methods Fourier Descriptor based
image retrieval with different for each respective
input image. Here we find the value of precision
is 0.43 and the value of recall is 0.11.

Figure 2 process block diagram of features of


fusion

IV Simulation and result analysis

For the validation and evaluation of proposed


algorithm used MATLAB software. The
Figure4 : The above figure shows the input query
hardware system configuration is coreI5
and retrieval result image which is based on the
processor and 4GB ram. The coral image top of Dinosaur is query image and result of
dataset conations 1000 image of different image shows in bottom by Fourier descriptors
categories such as animal, house, river and based image retrieval. This image shows the
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
23
combine value of performance parameter such input image. Here we find the value of precision
as precision and recall, the both parameters are is 0.98 and the value of recall is 0.14.
calculated on the basis of their used
algorithm/methods Fourier Descriptor based .
image retrieval with different for each respective
input image. Here we find the value of precision
is 0.39 and the value of recall is 0.15.

Figure 6: The above figure shows the input query


and retrieval result image which is based on the
top of Bus is query image and result of image
shows in bottom by Fourier descriptors based
Figure5 : The above figure shows the input query image retrieval. This image shows the combine
and retrieval result image which is based on the value of performance parameter such as
top of Hill is query image and result of image precision and recall, the both parameters are
shows in bottom by Fourier descriptors based calculated on the basis of their used
image retrieval. This image shows the combine algorithm/methods Fourier Hybrid to all based
value of performance parameter such as image retrieval with different for each respective
precision and recall, the both parameters are input image. Here we find the value of precision
calculated on the basis of their used is 0.57 and the value of recall is 0.07.
algorithm/methods Fourier Descriptor Hybrid
based image retrieval with different for each Image Partial Feature Fourier
respective input image. Here we find the value of Categor Extraction Descriptor
precision is 0.72 and the value of recall is 0.11. y Based Retrieval Based Retrieval
Precisi
Reca Precisi Reca
on ll on ll
Beache 0.542
0.11 0.433 0.11
s 2 1
Dinosau 0.477 0.12 0.396 0.15
rs 1 2
Hills 0.741 0.12 0.734 0.09
7 9
horses 0.997 0.20 0.930 0.08
5 0
Buses 0.596 0.14 0.318 0.15
7 0
Table1 : The above table shows the
Experimental result analysis of Partial Feature
Extraction Based Image Retrieval and Fourier
Figure7 : The above figure shows the input query Descriptor based image retrieval Method for the
and retrieval result image which is based on the performance parameter precision and recall
top of Horse is query image and result of image which applied on the input images like Beaches,
shows in bottom by Fourier descriptors based Dinosaurs, Hills, Horses and Buses for the
image retrieval. This image shows the combine respective methods.
value of performance parameter such as
precision and recall, the both parameters are
calculated on the basis of their used
algorithm/methods Partial feature Hybrid based
image retrieval with different for each respective
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
24
Image Partial Feature Fourier
Catego Hybrid Descriptor Comparasion graph between Recall and
ry Extraction Hybrid Based Precision for PF and FD Based Retrieval
Based Retrieval 1.2
Retrieval
Precisi Rec Precisi Rec 1
on all on all 0.8
Beache 0.658 0.20 0.422 0.14
s 1 4 0.6
Dinosa 0.485 0.13 0.385 0.14
0.4
urs 3 4
Hills 0.735 0.18 0.724 0.11 0.2
8 2
0
horses 0.988 0.14 0.914 0.09
6 8 0 2 4 6
Buses 0.605 0.18 0.307 0.15 PF Precision PF Recall
7 9
Table2 : The above table shows the FD Precision FD Recall
Experimental result analysis of Partial Feature
Hybrid Extraction Based Image Retrieval and Figure 10: Shows that the performance
Fourier Descriptor Hybrid based image retrieval comparison of Recall and Precision of Partial
Method for the performance parameter precision Feature Extraction Based Retrieval and Fourier
and recall which applied on the input images like Descriptor based Retrieval.
Beaches, Dinosaurs, Hills, Horses and Buses for
the respective methods.

Imag Fourier Partial Fourier


e Descriptor Feature Hybrid to Comparasion graph between Recall and
Cate Based Extraction All Based Precision for PFH and FDH Based
gory Retrieval Based Retrieval Retrieval
Retrieval 1.2
Prec Re Prec Re Prec Re
ision call ision call ision call 1
Beac 0.56 0.1 0.62 0.1 0.85 0.1 0.8
hes 8 01 2 02 6 01
0.6
Dino 0.37 0.1 0.45 0.1 0.54 0.1
saur 7 02 2 42 8 42 0.4
s 0.2
Hills 0.44 0.1 0.54 0.0 0.61 0.0
8 11 7 89 1 89 0
hors 0.61 0.1 0.65 0.1 0.81 0.0 0 1 2 3 4 5 6
es 0 17 5 95 7 89
PFH Precision PFH Recall
Buse 0.47 0.1 0.49 0.1 0.57 0.0
s 4 02 8 37 0 70 FDH Precision FDH Recall
Table3 : The above table shows the
Experimental result analysis of Partial Feature
Figure 8: Shows that the performance
Extraction Based Image Retrieval and Fourier
comparison of Recall and Precision of Partial
Descriptor based image retrieval and Fourier
Feature Hybrid Extraction Based Retrieval and
Hybrid to all based image retrieval Method for
Fourier Descriptor Hybrid based Retrieval.
the performance parameter precision and recall
which applied on the input images like Beaches,
Dinosaurs, Hills, Horses and Buses for the
respective methods.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
25
Comparasion graph between Recall and Recall Graph for FD, PF and FHA
Precision for FD, PF and FHA Based Retrieval
0.25
0.9
0.8 0.2
0.7
0.6
0.15
0.5
0.4
0.3 0.1
0.2
0.1 0.05
0
0 2 4 6 8 0
0 1 2 3 4 5 6
Beaches Dinosaurs Hills
horses Buses FD PFE FHA

Figure 9: Shows that the performance Figure 12: Shows that the performance graph of
comparison of Recall and Precision of Partial Recall of Partial Feature Extraction Based
Feature Extraction Based Retrieval, Fourier Retrieval, Fourier Descriptor based Retrieval
Descriptor based Retrieval and Fourier and Fourier Descriptor to All based Retrieval.
Descriptor to All based Retrieval.
V Conclusion & future Scope

In this paper proposed a new approach for


content based image retrieval. The proposed
Precision Graph for FD, PF and FHA
approach used two swarm based optimization
0.9 algorithm for the fusion of feature of image. The
0.8 particle swarm optimization take an input of
features descriptor and ACO process the local
0.7 set features optimization. The process of fusion
0.6 state defines the two constraints function one is
0.5 selection of particle for the process of ACO and
other is ACO for the Selection of features fusion.
0.4 For the extraction of features used two features
0.3 descriptor one is partial features descriptor and
0.2 other is Fourier counter descriptor. For the
validation estimation of the performance of
0.1 methods used MATLAB software and coral
0 image database. The coral image database
0 1 2 3 4 5 6 consists of 1000 image such as bus, horses, hill
and many more image data. the evaluation of
FD PFE FHA performance used precision and recall. The
value of precision of fusion of PSO-ACO is 95-
98%. And in terms of individual features
Figure 11: Shows that the performance graph of
descriptor the value of partial features descriptor
Precision of Partial Feature Extraction Based
is average on 85-90%. And the value of precision
Retrieval, Fourier Descriptor based Retrieval
of Fourier descriptor is 80-85 %. the process of
and Fourier Descriptor to All based Retrieval.
features fusion is very complex and take more
iteration for the generation of fused features. In
future reduces the number of iteration of
algorithm and reduces the time complexity in
terms of execution factor.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
26
REFERENCES [15] G. Amayeh, G. Bebis, A. Erol, and M. Nicolescu,
Peg-free hand shape verification using high order
[1] QingyongXu, Shunliang Jiang, Wei Huang, Famao Zernike moments, IEEE, 2009, Pp. 40-48.
and Shaoping Feature Fusion Based Image Retrieval
Using Deep Learning, Journal of Information & [16] TaherNiknam, ElaheTaherianFard,
Computational Science, 2015,Pp 23612373. NargesPourjafarian and AlirezaRousta An efcient
hybrid algorithm based on modied imperialist
[2] JING-YAN WANG and ZHEN ZHU IMAGE competitive algorithm and K-means for data
RETRIEVAL SYSTEM BASED ON MULTI-FEATURE clustering Engineering Applications of Articial
FUSION AND RELEVANCE FEEDBACK, IEEE, Intelligence, 2011, Pp 306-317.
2010, Pp 2053-2058.
[17] BahmanBahmaniFirouzi, MokhtarShaSadeghi
[3] TaherNiknam and BabakAmiri An efcient hybrid and TaherNiknam A NEW HYBRID ALGORITHM
approach based on PSO, ACO and k-means for BASED ON PSO, SA, AND K-MEANS FOR
cluster analysis, Elsevier, 2009, Pp 183-197. CLUSTER ANALYSIS, ICIC, 2010, Pp 31773192.

[4] Yan Meng, lrundamillaKazeem and Juan C. [18] Ganesh Krishnasamy, Anand J. Kulkarni and
Muller A Hybrid ACO/PSO Control Algorithm for RaveendranParamesran A hybrid approach for data
Distributed Swarm Robots, IEEE, 2007, Pp 1-8. clustering based on modied cohort intelligence and
K-means, Expert Systems with Applications, 2014,
[5] Chi Zhang and Lei Huang Content-Based Image Pp 6009-6016.
Retrieval Using Multiple Features, CIT, 2014, Pp 1-
10. [19] RadhaThangaraj, Millie Pant, Ajith Abraham and
Pascal Bouvry Particle swarm optimization:
[6] Juan Manuel Ramirez-Cortes, Pilar Gomez-Gil, Hybridization perspectives and experimental
Gabriel Sanchez-Perez and David Baez-Lopez A illustrations, IEEE, 2011, Pp 1-19.
Feature Extraction Method Based on the Pattern
Spectrum for Hand Shape Biometry, WCECS, 2008 [20] TaherehHassanzadeh and Mohammad Reza
Pp 1-4. Meybodi A New Hybrid Approach for Data Clustering
using Firefly Algorithm and K-means, IEEE, 2010,
[7] Kulkarni K. Ramesh, NiketAmoda, Ecient image Pp1-5.
retrieval using region based image retrieval ICWAC,
2013, Pp 22-31. [21] Sun Xu, Zhang Bing, Yang Lina, Li Shanshan and
GaoLianru Hyperspectal image clustering using ant
[8] Zheng Liang, Shengjin Wang, Qi Tian, Coupled colony optimization(ACO) improved by K-means
binary embedding for large-scale image retrieval, algorithm, IEEE, 2010, Pp 1-22.
IEEE Transactions on Image Processing, 2014, Pp
3368-3380. [22] SunitaSarkar, Arindam Roy and
BipulShyamPurkayastha Application of Particle
[9] Huang Wei, Yan Gao and KapLuk Chan A review Swarm Optimization in Data Clustering: A Survey,
of region-based image retrieval, Journal of Signal International Journal of Computer Applications, 2013,
Processing Systems, 2010, Pp 143-161. Pp 38-46.

[10] M. RaoBabu A new feature set for content based [23] SerkanKiranyaz, JenniPulkkinen, TurkerInce and
image retrieval, Information Communication and MoncefGabbouj MULTI-DIMENSIONAL
Embedded Systems (ICICES), 2013, Pp 1-1. EVOLUTIONARY FEATURE SYNTHESIS FOR
CONTENT-BASED IMAGE RETRIEVAL IEEE, 2011,
[11] P. S. Hiremath and JagadeeshPujari Content Pp 3645-3648.
based image retrieval using color, texture and shape
features, IEEE, 2010, Pp 780-784. [24] Anil Balaji, R.P. Maheshwari and R.
Balasubramanian Content-Based Image Retrieval
[12] Xiangyang Wang, Yongjian Yu and Hongying using colour feature and colour bit planes, IJSISE,
Yang An eective image retrieval scheme using 2014, Pp 44-57.
color, texture and shape features, Computer
Standards Interfaces, 2011, Pp 59-68. [25] SwapnaliniPattanaik and D.G. Bhalke Efficient
Content based Image Retrieval System using Mpeg-7
[13] J. EAKINS and M.GRAHAM Content-based Features, International Journal of Computer
Image Retrieval University of Northumbria at Applications, 2012, Pp 19-24.
Newcastle, 2014, Pp 25-41.
[26] ElhamAkbariBaniani and AbdolahChalechale
[14] M. S. LEW, N. SEBE and C.DJERABA Content- Hybrid PSO and Genetic Algorithm for Multilevel
based multimedia information retrieval: State of the art Maximum Entropy Criterion Threshold Selection,
and challenge ACM, 2010, Pp 119. International Journal of Hybrid Information
Technology, 2013, Pp 131-140.

[27] T. Kanimozhi and K. Latha An Adaptive


Approach for Content Based Image Retrieval Using
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
27
Gaussian Firefly Algorithm ICIC, 2013, Pp 213218, 2013.

[28] N. Singhai and S. K. Shandilya, ''A Survey On: Content Based Image Retrieval Systems'', International
Journal of Computer Applications, 2010 Pp 485-490.

[29] L. Ballerini, X. Li, R. B. Fisher, Be. Aldridge, J. Rees, "Content Based Image Retrieval of Skin Lesions by
Evolutionary Feature Synthesis," Evo Applications, 2010, Pp 312-319.

[30] S. Kiranyaz, T. Ince, A. Yildirim and M. Gabbouj, Fractional Particle Swarm Optimization in Multi-
Dimensional Search Space, IEEE, 2010, Pp 298-319.

[31] Kirti Jain, Dr. Sarita Singh and Dr. Gulab Singh Partial Feature Based Ensemble of Support Vector
Machine for Content based Image Retrieval, International Journal of Innovative Research in Computer and
Communication Engineering, 2013, Pp 622-625.

[32] KirtiJain andDr.Sarita Singh Bhadauria Enhanced Content Based Image Retrieval Using Feature
Selection Using Teacher Learning Based Optimization in International Journal of Computer Science and
Information Security (IJCSIS), Vol. 14, No. 11, November 2016

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
28
Secure and Scalable Transformation of medical Imaging Data in Cloud
using Customized Hospital based Management Systems
Nanda Gopal Reddy1, Roheet Bhatnagar2
1
Research Scholar, CSE Department, Manipal University Jaipur, India
2
Professor & HOD, CSE Department, Manipal University Jaipur, India

Abstract: As more advanced medical imaging modalities and innovative emerging technologies
are being used in patient care and medical research, the scope and volume of data and the
complexity of associated analytics is increasing. As such, there is increasing need for new
concepts, technologies and imaging informatics methods to aggregate, transfer, manipulate,
analyze, manage, and visualize medical data for prediction, diagnosis, treatment, rehabilitation and
research. There is a wealth of information within medical image data that is often difficult to mine
effectively. One role of imaging informatics is bridging gaps between the scientific, diagnostic and
therapeutic realms. This track focuses on methods for analyzing big data in medical imaging and
informatics, emerging innovative imaging and informatics technologies, new research and
applications of imaging informatics, and the next generation of PACS that will accommodate other
imaging-rich clinical specialties and people to save their time and tension and money.
Medical image processing is one of the most modern and important branch in image processing
fields. Although there are many types of medical device give many variants types of medical image
but still the obtained images need more and more processing to reach to optimized level to aid the
surgeries to detect the infections, distortions, tumors and cancers in different human body organs.
In this research, the proposed framework introduces designing and implementing of two proposals
of medical images processing: First proposal deals with breast images obtained from
mammography that to detect if there is any infections or distortions in each sides by using
proposed comparing procedure and negative transformation and visualize the distortion area after
the obtained measurements. Second proposal deals with brain images proposed to obtain by (MRI
and NMI) and chest images proposed to obtain by (CT and PET) that to detect if there is a tumors
in brain or cancer in chest by using sequence of modified image processing procedures begin with
noise removal going to study the nature of image to detect one of two proposed image fusion
techniques then visualize the fused image which display the tumors and cancers much more
clearly.

1. INTRODUCTION information maintained in the cloud in recent


technologies.
In modern generation, Health care
information is a imperative knowledge However the perceptive information of the
provided to the modern user to maintain and user health record is given to the wide range
strengthen their medical data record. A of users through HMS by some trusted
systematic health track information system is servers in which the key concern issue is to
extensive and accessible to present day maintain the confidentiality and security
users is a essential specification ,to provide risks. The users are unable to trust the third
this applicability a new service originated as party server to forward their sensitive health
Hospital Management System(HMS) which data information across the web. The
gives the wide range of users the opportunity scalability factor of PHP makes the privacy
to share, store, retrieve and maintain their risk factor a threat to user. This ubiquitous
health record information .although it is access of information of PHP is guarded by
exiting utility for users but there are many the information security protocol techniques
confidentiality risk factors while exchanging of encryption and decryption.
the health data information to their friends The key management is to deport keys to
and family members. different user for accessing the PHI and
multiple users may be authorized to encrypt
In recent past many HMS services are which causes the chaos and other side is to
being outsourced by third party providers like employ a central authority (CA) to formulate
Microsoft Health vault which are too pricey the key management on behalf of HMS
option for the users the health data owners

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
29
complexities issues in HMS sharing file in
In this Paper the vital focal point is to multiple domains is compared to various
utilize the approach of ABE (attribute based earlier files system to investigate the several
encryption) which allows the users (single security issues and further examination is
and multiple) to significantly share HMS done by performing various experimental
protectively on semi trusted servers and observations and simulations with real time
provides high end extensible security for key HMS file applications.
management of HMS. To illustrate the implementation of MA-
ABE in public domain and to identify which
(a) personal User are accustomed with type of multiple users is accessing the file
small number of user along with access can be clarified by the ABE fine grain data
rights of the decryption key to access the access control and revocable ABE approach
HMS given to the confined small number of methods.
user group generated by the owner who
precisely authorizes access privileges to the 2. ABE for Fine-grained Data Access
personal users to encrypt a HMS under its Control:
own individual data attributes .HMS sharing
applications is restricted to personal domains In Recent times the ABE fine grained data
through minimal key generations to Access control is applied to increasing
productively access the PHI. electronic healthcare records system (EHRs)
Fig 1: basic Architecture of HMS with to track the multiple user domains accessing
attributes the HMS file in the web. The data access
rights are given by the user owner who
HMS specifies the encryption and decryption key
to the corresponding public users.

Personal Medical
histor
Examin Insuranc Sensitive The ABE provides data access to different
Info ation e info Info
y public user by the abstraction of single
trusted Authority approach(TA) for multiple
Lab
domains where the TA systematically defines
Name, HIV
Conditio
ns
Aller Medic Phy
sical
te the access right privileges to different public
DoB, gies ations st Pro
age, /Presc Exa file
user to encrypt the HMS file and ensuring
sex, riptio m high end confidentiality and security factor
height, n risk .
SSN Pulse, heat Blood X-ray for example The Patient EHR (electronic
rate test image
s
health record) is given to multiple user in
public domain through the single trusted
(b) Public user domains gives access Authority gives data access rights for the
privileges to distributive multiple attributes public domain user like researchers, doctors
which as multiple users Sharing HMS and pharmacy ensuring security and integrity
through the Multi authority (MA-ABE) to of the data . And the HMS file is scalable to
enhance the security and confidentiality level the entire public user who can encrypt and
guarded with full privacy control over their decrypt the file accordingly by the access
HMS. The set of user group is given set of right privileges given by the TA. the file is
multiple attributes to share the PHI and Investigated and examined in different
access right privileges is given to the list of perspectives by all the public user and
users to encrypt the HMS file without through the ABE approach the file is
knowing the complete details list of user. integrated and updated for further
The main concern is to assimilate the (ABE) improvements .
into Large scale HMS file system and the
main concern is to deport the keys to 2.1 Revocable ABE
multiple users and to manage the dynamic
policy updates, scalability factors and The key challenge complication is to
revocation of the HMS file System. revoke the user attributes encryption
accordingly for the set of public users after
(c) The proposed paper specifies the their access the HMS file for Preliminary
analysis of the scalability issues and investigation. Many of the public users are

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
30
being provided a secured attribute encryption
key by single trusted Authority (TA) and the
access rights are declined periodically
ensuring data integrity. The revoking the
access rights for an individual user from
multiple user domains are a crucial for the
trusted Authority.
The revocable ABE ensures to invalidate
the attribute key from the multiple user
domains accessing the HMS file ensuring
confidentiality.

3. FRAMEWORK FOR HOSPITAL AND


PATIENT ENABLED, SECURE AND
SCALABLE HMS SHARING
In this section, we describe our novel
hospital and patient enabled secure data
sharing framework for cloud-based HMS
systems. The main notations are
summarized in Table 1.
3.1 Problem Definition
We consider a HMS system where there 3.1.2 Requirements
are multiple HMS owners and HMS users. The primary requirement to achieve
The owners refer to patients who have full patient-centric HMS sharing is to ensure
control over their own HMS data, i.e., they that patient can control who are authorized to
can create, manage and delete it. There is a access to his / her own HMS documents. For
central server belonging to the HMS service any electronic health record system, user
provider that stores all the owners HMSs. controlled read/write access and revocation
The users may come from various aspects; are the two core security requirements. The
for example, a friend, a caregiver or a security and performance requirements are
researcher. Users access the HMS listed below:
documents through the server in order to Data confidentiality. The system allows
read or write to someones HMS, and a user decrypting of HMS document only to users
can simultaneously have access to multiple who have enough attributes, satisfy access
owners data. A typical HMS system uses policy, or have valid access privileges.
standard data formats. Write access control. The system should
3.1.1 Security Model allow legitimate users to access the server,
In this paper, we consider the server to be and prevent the unauthorized contributors to
semi-trusted, i.e., honest but curious as gain write-access to owners HMSs.
those in [4] and [5]. That means the server The system should be able to
will try to find out as much secret information accommodate dynamic changes to the
in the stored HMS files as possible, but they predefined data access policies of the health
will honestly follow the protocol in general. record.
Fig 2: Frame work representation of Scalability, efficiency and usability. The
HMS HMS system should support users from both
the personal domain and public domains.
The system should be highly scalable, in
terms of complexity in access key
management, communication, computation
and storage, as the users from the public
domain might be large in size and
unpredictable .The data owner should be
able to manage users and access keys with
minimal efforts.

4. Overview of Our Framework


The key objective of our framework is to
provide protected patient centric HMS

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
31
access and efficient key management. The s y
system is segregated into multiple security Au1 : Physici M.D. Internal Hospital
domains (public domains (PUD), and user an medicin A
personal domain (PSD)) based on the user 1 e
data access requirements. The PUDs are Au2 : Nurse Nurs Geronto Hospital
professional users (doctors, nurses and user e logy B
medical researchers), and tagged with 2 licen
independent sectors (healthcare, se
government or insurance sector). Data Au3 : Pharm Phar General Pharma
owners provide access rights to select users user acist m. cy C
(such as family members or close friends) 3 licen
through PSDs. se
Key 1-out-of-n1 1-out- 1-out-of-
polici 1-out-of-n2 of-n3 n4
TABLE 1 es
Frequently Used Notations in this
Algorithm
Basic usage of MA-ABE
UR, UD ---Universal Attributes for Roles
There are different protocols to ensure the
and Data
L(T ), T Tree and Leaf Node set public users accessing the HMS file pursue
ACk ---- Attributes in the cipher Text rigid rules
Auk--- User us attributes given by the Rule 1: Basic encryption rule for PUD to
kth AA follow from the data owners
A, a An attribute type, a specific Rule 2: Key Policy generation and Key
attribute value generation: it is the necessary rule for
P ---- Access policy for a HMS PUDS to follow the agreement policy given
document by the data owners
P ------- A key-policy assigned to a user Rule 3: Correctness: To protect
MK,PK -----Master key and public key in correctness of the secret key generated
ABE between the PUDS and data owners
SK A users secret key in ABE Rule 4: Completeness: To ensure the
rk(k)j------- Proxy re-key for attribute j and above all rules and protocols are followed by
version k the PUDS and generating the secure secret
key to access the HMS file by the MA-ABE
In this section we examine considerable approach. All these key values are shown in
design issues for sharing HMS in cloud using Table 2.
MA-ABE for public domain. The Key
management is to hold the multiple PUDS, By using the MA-ABE approach multiple
which are being given multiple attribute PUDS are accessing the HMS file ensuring
encryption to access the HMS file, a firm their follow the key policy generation rules
privacy guaranteed policy rules are given by given by the data owners to safe guard data
data owners to the public users. In these integrity and confidentiality of the HMS file
criteria there is an agreement on the key .The key policy generation rules can be
policy between the public user and the data further extended for different user accessing
owners. Further the data owners can specify the file for multiple domain as long as their
the threshold point where only few public primary attribute is specific.
users are given the access right privileges.
For multiple PUDs to access the HMS file
Table2: Important Key-Policies for following the MA-ABE protocols for efficiency
public users in healthcare Domains. a randomly file encryption key is generated
(FEK) for each cipher text, and the length of
Attrib AMA ABMS AHA the cipher text grows linearly with the
ute increased number of the users
Attrib A1 A2 A3 A4 : Therefore the data owners imposes this
ute :Profes :Lice :Medica Organiz key generation policy during encryption that
type sion nse l ation the MA_ABE scheme specifies same like
statu specialt CP_ABE scheme

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
32
They are two theory for the en crypters the Scalability and Efficiency indeed
access( PUDS ) to access the file the initial depends indirectly upon the storage and
one is the across the different authorities maintenance level of the documents and
where the conjunctive rules imply and certainly computational costs uncertainty.
further different attributes that governed DNF
policies are supported. 6. CONCLUSION

5. Enhancing MA_ABE for user In this paper, we proposed a new work of


Revocation: how HMS system can share the patients
The data owners can invalidate the access data safely and securely by adding different
right privileges to the public user dynamically attributes. We utilized the ABE services to
and can update the further to the authorized give maximum throughput. The framework
server which intensify efficiency and to cut explains the unique challenges brought by
down the computational difficulty a proxy multiple HMS operators and users, in that
encryption approach is implemented. we maximum reduce the complexity of key
Subsequently the data owners can revoke management while enhance the privacy
and unrevoked the public users according to guarantees compared with previous works
the acquaintances.
Proportionately a lazy revocation is REFERENCES:
imposed in a situation where a afflicted [1] K. D. Mandl, P. Szolovits, and I. S. Kohane,
cipher texts and user secret key are being Public standards and patients control: how to
keep electronic medical records accessible but
updated and to unrevoked the users the re- private, BMJ, vol. 322, no. 7281, p. 283, Feb.
key option is implemented which updates all 2001.
the information of the user from the last login [2] At risk of exposure in the push for electronic
to the current one. medical records, concern is growing about how
5.1 Enforce write Access control: well privacy can be safeguarded, 2006. [Online].
The data owners impose the write access Available:
http://articles.latimes.com/2006/jun/26/health/he-
control to the public user to protect high end privacy26
confidentiality and security by introducing the [3] Russello G, C. Dong and N. Dulay, Shared
hash chain technique. and searchable encrypted data for untrusted
Hash chain technique is applied when servers, in Journal of Computer Security, 2010.
there is a conflict between the data owner [4] S. Yu, C. Wang, K. Ren, and W. Lou,
and the PUDS to access the HMS file . it can Achieving secure, scalable, and fine-grained data
access control in cloud computing, in IEEE
grant or dismiss the grant policy to the INFOCOM10, 2010.
different user across different domain and [5] chase M, Benaloh. J,.Horvitz V, and Lauter K,
can deny all the rights to access ,update and Patient controlled encryption: ensuring privacy of
modify the HMS file. electronic medical records,in CCSW 09, 2009,
5.2 Handle Dynamic Policy Changes: pages. 103114.
In this approach the data owners can [6] H. Lohr, A.-R. Sadeghi, and M. Winandy,
Securing the e-health cloud, in Proceedings of
dynamically enumerate, modify and delete the 1st ACM International Health
the HMS document being accessed by InformaticsSymposium, ser. IHI 10, 2010, pp.
different user at different domains. This can 220229.
be effectually implemented by the proxy [7] Yu .s , Li M, Ren. K, and Lou W, Securing
encryption, File encryption key method and personal health records in cloud computing:
policy update approach which are pricey Patient-centric and fine-grained data access
options for the data owners control in multi-owner settings, in
SecureComm10, Sept 2010, pages. 89106.
5.3 Deal with Break glass Access:
[8] Google, microsoft say hipaa stimulus rule
This approach is enforced when the data doesnt apply to them,
owners are unavailable to give the access http://www.ihealthbeat.org/Articles/2009/4/8/.
right privileges to the different users in the [9] M. Li, S. Yu, N. Cao, and W. Lou, Authorized
different domain. In such a case a temporary private keyword search over encrypted personal
authorization is issued to access the health records in cloud computing, in ICDCS 11,
Jun. 2011.
document and further the access can be
[10] The health insurance portability and
revoked from the users. Therefore these are accountability act. [Online]. Available:
the different security measures to protect the http://www.cms.hhs.gov/HIPAAGenInfo/01
data integrity and confidentiality of the HMS Overview.asp
file being accessed by the public user and

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
33
A Comparative Analysis of different techniques for triple level
biometric authentication for human

Rohit Srivastav, Dr. Prateek Srivastava


(1) PhD Scholar ,Dept. of Computer Sc. & Engineering, (2) Assistant Professor,
School of Engineering, Sir Padampat Singhania University, Udaipur,Rajasthan, India

Abstract: Biometric identification process is used for recogniting and identifying a person for
various applications. The process can be done by using single Biometric feature or a combination
of Biometric features. If the identification is done by using a single Biometric feature (face iris
finger , palm etc ) then the system is called as Unimodal and if a combnation of Biometric is used
then it is called as Multimodal. In multimodal system various drawbacks of Unimodal system
(Noisy Data , Multiple vectors etc) are removed. The main goal of the proposed work is to design a
framework that will provide authentication based upon three level authentication for a person.
Earlier works in this field are explained in different statistical models based on different
authentication schemes. They tried to estimate the predictable output values with known historic
data. In those procedures, they tried to authenticate with the help of transformations and
analysis.In the proposed method a mechanism is developed in which if one biometric trait gets
failed then the other biometic traits can be used for authentication.

Keywords: Principal Component Analysis, Face Recognition, Fingerprint Recognition,


Miniutae Matchnig, Score Fusion, Palmprint Recognition

in single biometric system is to make use of


I. Introduction
multi-modal authentication biometric systems.
A biometric system refers to a pattern This model combines information from multiple
recognition system that have ability to acquires modalities to dictate a decision [2].
biometric data from an individual [1]. The
requirements of enhanced security in biometric
based upon the authentication of a person has This paper presents the review of multimodal
led us to an interesting area. Those biometrics biometrics. This includes a brief introduction
systems that are based on single information about multimodal biometrics. In this paper,
source are called Uni-modal Systems [2]. various fusion techniques of multi-modal
Unimodal biometrics have many implicit biometrics have been discussed. In this paper
problems in their applications. The major a fusion technique is proposed based on
difficulty with uni-modal biometric technology is face,fingerprint and palmprint biometric traits.
that it is not perfect suited for all applications After capturing features preprocessing is done
[3]. Limitations of uni-modal biometric systems and features are extracted for feature level
even though these systems offers a reliable fusion. Biometric Fusion is classified into 5
solution for secured verification and it is categories:
commonly used in numerous commercial (a) Sensor Level Fusion: This is also referred
systems in practice; it suffers from following to as image level or pixel level fusion. This is
limitations: Sensed data noise, intra-class possible only if multiple samples are fused that
variation, intra-class similarities, spoof attacks are taken using the same sensor. If multiple
and non-universality [4]. Hence, it is not sensors are used then the data from different
possible to sources must be compatible. The raw data
achieve desired performance by single contains a lot of information but at the same
biometric system. One of the methods to solve time it is corrupted by noise.
these problems which are encountered

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
34
(e) Decision level fusion: The decision level
or abstract level fusion is possible only when
the output from individual biometric matchers is
available. The output from the different
matchers are fused using the AND and OR
Fig 1: Sensor Level Fusion[3] rules. The output of the AND rule is a
match only when the input test sample is
(b) Feature Level Fusion: In this fusion the matched with the stored templates at the
data from different sources are separately output of each matcher. Whereas, the OR
processed, features are extracted and a joint rule outputs a match decision even if one of
feature vector is computed for matching the matcher+ decides that the input test
against the stored template. The fusion can be sample matches with the stored templates.
easily accomplished if the features are
extracted using same algorithm otherwise it
becomes tedious .

Fig 4: Decision Level Fusion[7]

Fig 2: Feature Level Fusion[3]


Artificial Neural Networks (ANN) has been are
(c) Score level fusion: Score level fusion is having a great usage in authentication systems
the combination of scores matched from th and also provides automation to the system..
output of the individual matcher. These The advantages of these models of the neural
matching scores indicate the approximation of network are to be seen in increase in
of identification of sample image form the approximation and cost reduction. Artificial
database. The matching score is rich in Neural Networks takes part as an important
information next to the feature vector and also task in support of the analysis of the big data
it is easy to access these values and combine sets in various forms of authentication.
them.
II. System Block Diagram

Fig 3: Score Level Fusion[3]

(d) Rank Level Fusion: Rank level fusion is


based on ranking of the output of the enrolled
identities. The matched identities are sorted in
descending order of matching statistics. Ranks
gives a clear information regarding the
decision-making process compared to just
identify the best match, but they reveal less
information as compared to score level. Just
like score match the ranking outputs are
comparable so, the normalization process is
not required.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February
35
III. Proposed Methodology kernel and the center of the region.
In order to extract the relevant features of the
A. Fingerprint Identification fingerprint, the Gabor filter was applied on the
The fingerprint identification is made of three framed part of the fingerprint following 8
main steps, namely the preprocessing, the different directions that is (0, 22.5 , 45 ,
feature extraction and the comparison step. 67.5 , 90 , 112.5 ,135 , 157.5 ). The results
The preprocessing is divided into two main are complex values which were encoded in
steps which are the normalization of the order to obtain a binary vector of size 1024,
fingerprint image, and the location and the representing the main features of the
framing of the central point of the fingerprint fingerprint image.
image. The normalization is used to eliminate
the effects of noise and distortion when B. Fingerprint Verification (Gabor Filter
capturing the image from the fingerprint approach):
sensor. The original image is normalized by its For identifying details in a fingerprint image
mean M and its variance VAR, the matrix G (I) Gabor filters are to be used..Matching of two
given by equation indicates the normalized fingerprint images is done on the basis of
grayscale image and G (i, j) is its value at pixel Euclidean Distance.The matching of two
(i, j). Where M0 and VAR0 are the desired images can be enhanced for performance by
mean and variance values, respectively. the combination of score decisions based on
different fingerprint features. [21]

C. Feature Extraction from Face Image


using Local Binary
Patterns (LBP)
For extracting face features Local Binary
Pattern (LBP) histogram approach is used . It
captures local face features. As shown in Fig.
6 LBP is centered at every pixel in the image.
It acts as a threshold value for all the
surrounding binarized pixels.By using
clockwise direction 8 bit number is generated
and placed at centre pixel location . In this way
a new image called LBP image is obtained .
The LBP image is then divided into blocks and
histograms of these blocks are calculated.
Fig 5: A Fingerprint Image[4] These histograms are concatenated to form a
single feature vector. If the binary number has
The following step is used to locate and frame maximum of 2
the central point of the fingerprint. The central
point detection algorithm is summarized as
follows:
- Estimate the orientation field
- Calculate the field strength of the loop at
each point in the
orientation field, using the expanded field of
the hidden
orientation.
- Normalize the resistance loop field row in a
range from 0 to
1. Fig. 6: Face Print Identification using LBP
- Perform a thresholding on the field loop to Approach
locate both the

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February
36
transition from 0 to 1 or 1 to 0, then it is called
uniform LBP. For face recognition, LBP
histogram features of two images are
compared using the Chi square distance metric
shown in Equation .

Here X and Y are the feature vectors and N is


their dimension. Nearest neighbour classifier
can be used to take the accept/reject decision.

D. Palmprint Identification (Left and Right Fig 8: Palmprint images of four subjects.
Palmprint):
For Palmprint identification a combination of Fig. 9 shows the principal lines images of the
left palmprint and right palmprint image has left palmprint, reverse right palmprint shown in
been used. For the proposed methodology a Fig. 8.
framework for fusing left palmprint and right
palmprint image is developed. For this
framework to successfully identify palmprint
image a fusion of three kinds of score is
required.Two scores can be generated by
using left and righr palmprint image whereas
for third score a specific algorithm is proposed.

Fig 8: Principal lines images.

Fig 7: Procedure for Palmprint fusion[7] According to the fiure [Fig. 9 (i)-(l)] we can see
that principal lines of palmprint image for a
Corelation Between the Left and Right same subject palmprint images of left and
Palmprints reverse right are almost similar in shape and
Left palmprint and Right palmprint images are position but for different persons is is different[
similar to each other .In Fig. 8 left palmprint Fig. 9 (m)-(p)]. So accoding to this result it can
images of four different subjects is taken. be concluded that the this feature of palprint
Again the right palmprint image and reverse images can be deployed for palmprint
palmprint image is also taken in the figure [Fig verification.
7] As depicted from the figure it is inferred that In the proposed framework firstly left palmprint
left palmprint and the reverse right palmprint images and then right palmprint images are
image of the same subject are similar in used for score calculation for each sample
nature. class After the matching score generation for
each class is generated final fusion is
performed for to obtain the identification
result.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February
37
and maxpalm are the corresponding values
obtained from palmprint image .

2) Fusion
The normalized values from finger , face and
palm print images are fused using sum rule as
-

MS = m*N face + n*N finger1 + p *Nfinger2+ q * Npalm


Fig 9: Fusion at the matching score level[4] _ (7)

After obtaining all three scores final matching where m, n, p and q are four weight values that
score is generated. Depending on the all the are assigned using the feature vector. If the
three matching scores ,final matching score is value of matching score is less than the actual
generated. After obtaining first and second score it can be easily misleaded . So the value
score the third kind of score is calculated by value of weight is assigned linearly.
performing crossing matching between the left
and right palmprint. For an ith matcher wi (i = Conclusion
1, 2, 3), which denotes the weight assigned to
the ith matcher, score can be adjusted and As per the current proposed system accuracy
viewed as the importance of the corresponding rate in multimodal biometric system is greater
matchers. than single biometric system. After
In the proposed method a strategy is experimentation it can be seen that the
introduced in which crossing matching score is accuracy of system would increase on
given to the fusion methodology. When w3 = 0, combination of multiple biometric features. The
the proposed method is equivalent to the Genuine Acceptance Rate is also improved
conventional score level fusion. Thus a using multibiometric recognition and Neural
performance enhancement is there for the Network approach. Also the system can be
proposed method compared to conventional developed using five level biometric traits in
methods by tuning the weight coefficients. future.

Acknowledgment
Fusion procedure for Biometric Features:
1) Score Normalization I take immense pleasure in expressing my
For resizing the matching scores to a criterion humble note of gratitude to my guide Dr.
between 0 and 1 Normalization is done. For Prateek Srivastava, Assistant Professor,
both the scores normalization is done by: Department of Computer Science and
Engineering, SPSU, Udaipur, Rajasthan, India
Nface = MSface-minface / max face- minface __ for his remarkable guidance and technical
(3) suggestions.
Nfingerl = MSfingerl-minfingerl / maxfingerl-minfingerl
__ (4) References
Nfinger2= MS finger2rminfinger2/ max finger2rmin
[1] Aizi K., Ouslim M.and Sabri A., Remote
finger2 __ (5)
Multimodal Biometric Identification Based on
Npalm = MSpalm-minpalm / maxpalm-minpalm__ the Fusion of the Iris and the Fingerprint, IEEE
(6) transactions on Information and Forensics, 12(6),
2015.
Where minface and maxface are the minimum [2] Lee T. and Bong D., Face And Palmprint
and maximum scores for Face recognition and Multimodal Biometric System Based on Bit-Plane
minfinger1and maxfinger1 are the resultset values Decomposition Approach, In Proc. International
obtained from applying minutiae matching over Conference on Consumer Electronics-Taiwan,
114(21), 2016.
fingerprint image. minfinger2 and maxfinger2 are
[3] Telgad R., Deshmukh P. and Siddiqui A. ,
the resultset values obtained from applying Combination Approach to Score Level Fusion for
Gabor filter over fingerprint image and minpalm

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February
38
Multimodal Biometric System By Using Face and IEEE Conf. Inf. Commun. Techno!.: From Theory to
Fingerprint , In Proc. International Conference on Applications (ICTTA
2008), pp. 1-5. DOl: 10.1109/ ICTT A2008.4530129.
Recent Advances and Innovations in Engineering
[15] A Rattani, D. R. Kisku, M. Bicego, Member,
(ICRAIE-2014), May 09-11, 2014, Jaipur, India. IEEE and M. Tistarelli,"
Feature level fusion of face and finger Biometric".
[4] Yong D., Bhowmik S., Magnago F.,An effective [16] N. Radha, A Kavitha," Rank Level Fusion Using
Power Quality classifier using Wavelet Transform Fingerprint and Iris
and Support Vector Machines, Expert Systems with Biometric", Indian Journal of Computer Science and
Applications, 42( 15), 2015, pp 6075. Engineering (UCSE)
[5] N. Radha, A Kavitha," Rank Level Fusion Using ISSN: 0976-5166 Vo!. 2 NO. 6 Dec 201 I-Jan 2012.
Fingerprint and Iris Biometric", Indian Journal of [17] Jian-Gang Wang , Kar-Ann Toh, Eric Sung ,
Wei-Yun Yau, "A
Computer Science and Engineering (UCSE) ISSN:
Feature level Fusion of Appearance and Passive
0976-5166 Vo!. 2 NO. 6 Dec 201 I-Jan 2012. Depth Information for Face
[6] Jian-Gang Wang , Kar-Ann Toh, Eric Sung , Wei- Recognition", Source: Face Recognition,
Yun Yau, "A Featurelevel Fusion of Appearance and [18] Muhammad Imran Razzak, Muhammad
Passive Depth Information for Face Recognition", Khurram Khan Khaled
Source: Face Recognition, Book edited by:Kresimir Alghathbar,Rubiyah Yusof, "Multimodal Biometric
Recognition Based on
Delac and Mislav Grgic, ISBN978-3-902613-03-5,
Fusion of Low Resolution Face and Finger Veins" ,
pp.558, I-Tech, Vienna, Austria, June2007. International ournal of
[7] Muhammad Imran Razzak, Muhammad Khurram Innovative Computing, Information and Control ICIC
Khan Khaled Alghathbar,Rubiyah Yusof, International 2011
"Multimodal Biometric Recognition Based on Fusion ISSN 1349-4198 Volume 7, Number 8, August 2011
of Low Resolution Face and Finger Veins" , pp.4679{4689}.
International ournal of Innovative Computing, [19] Anil K. Jain,Fellow,IEEE, Salil Prabhakar,Lin
Hong, and Sharath
Information and Control ICIC International 2011
Pankanti,"Filterbank-Based Fingerprint
ISSN 1349-4198 Volume 7, Number 8, August 2011 Matching",IEEE transactions on
pp.4679{4689} Image processing,VOL.9 No. 5 May 2000.
[8] Y. Xu, Q. Zhu, D. Zhang, and J. Y. Yang, [20] Jian Wang, Jian Cheng, "FACE RECOGNITION
Combine crossing matching scores with BASED ON FUSION
conventional matching scores for bimodal OF GABOR AND 2DPCA FEATURES", IN
International Symposium
biometrics and face and palmprint recognition
on Intelligent Signal Processing and Communication
experiments, Neurocomputing, vol. 74, no. 18, pp. Systems, December
3946 3952, Nov. 2011. 2010, pp. 1-4.
[9] A. K. Jain and J. Feng, Latent palmprint [21] P. 1. B. Hancock, Y. Bruce and A M.
matching, IEEE Trans. Pattern Anal. Mach. Intell., Burton,"Testing Principal
vol. 31, no. 6, pp. 1032 1047, Jun. 2009. Component Representations for Faces", Proc. of 4th
[10] A. K. Jain, A. Ross, and S. Prabhakar, An Neural Computation and
introduction to biometric recognition, IEEE Trans. Psychology Workshop, 1997.
Circuits Syst. Video Technol., vol. 14, no. 1, pp. 4 [22] Jonathon Shlens, "A Tutorial on Principal
20, Jan. 2004. Component Analysis",
[11] Y. Xu, Q. Zhu, D. Zhang, and J. Y. Yang, Systems Neurobiology Laboratory, Ver.2, 2005
Combine crossing matching [23] Zhujie, Y.L.Y., 1994. Face recognition with
scores with conventional matching scores for Eigen faces. Proc.IEEE Int!.
bimodal biometrics and face and palmprint Conf. Industrial Techno!. Pp: 434-438.
recognition experiments, Neurocomputing, vol. 74, [24] Sushama S. Pati!, Gajendra Singh chandel,
no. 18, pp. 39463952, Nov. 2011. Ravindra Gupta, "Fingeprint
[12]R.L. Telgad,Dr. P.DDeshmukh,"Computer Aided Image Enhancement Techniques and Performance
Technique for Finger Evaluation of the SDG and
Print Image Enhancement and Minutiae Extraction FFT Fingerprint Enhancement Techniques ",
"U.C.A Volume 75, International Journal of
Number 17, August 2013. Computer Technology and Electronics
[13] Ajay Kumar, Senior Member, IEEE, Sumit Engineering(JJCTEE), ISSN 2249-
Shekhar," Personal 6343, Vol 2, Issue 2, pp. 184-190.
Identification Using Multibiometrics Rank-Level
Fusion", IEEETransactions
On Systems, Man, And Cybernetics-PART C:
Applications And Reviews.
[14] F. Besbes, H. Trichili, and B. Solaiman, -
Multimodal biometric system
based on Fingerprint identification and Iris
recognition,1I in Proc. 3rd Int.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February
39
Software fault detection using Fuzzy c- means and support vector
Machine
[1]
Hirali Amrutiya, [2] Riddhi Kotak, [3]Mittal Joiser
[1]
Student, MEFGI,[2] Assistant Professor, MEFGI, [3] Assistant professor, MEFGI

Abstract: Organization like financial, medical, airline, and banking are require a very high quality
software. If failure happens in this system cause high financial cost and affect the people lives. So it
is important to develop the fault free software. Software fault detection is important for the software
quality. Limited testing resources used to assurance the quality of software. The classification model
is trained using the dataset. We tend to propose the framework which consist data pre-processing
approach with Support vector Machine (SVM) classifier. In Data pre-processing relevance analysis
perform using feature raking and redundant feature remove using the Fuzzy C- Means clustering
techniques.

Index Terms: Data Mining, Fuzzy C-Means, Pre-processing, Software Defect Detection,
Support Vector Machine

I. INTRODUCTION datasets. [3]. Model used historical data for


Software play important role in daily lives and training the classifier.
become common day by day. Every one use the The main objective of paper is to design
software in daily life so the quality of the software defect detection system using the
software is most important for end user. fuzzy c-means and Support Vector Machine
Organization like financial, medical, airline, (SVM). Data pre-processing perform to identify
banking are require a very high quality software. relevant feature. Classification of fault data
If failure happen in this system cause high measure using the accuracy, ROC ,Area of
financial cost and affect the people lives. Its ROC curve, recall, precision.
important to develop the quality software by
organization or developer. Failure in software II. RELATED WORKS
happen because of the ambiguities in The quality of dataset is improved by data
requirements, design, code and testing cases. pre-processing, which incorporates feature
Fault identification in testing phase of software selection and sampling which reduce instances.
development life cycle not cost more but In feature selection is the method of distinctive
identification of failure at maintenance stage and removing irrelevant and duplicate features
cost more. Software defect detection model is from a dataset in order that which increase the
used for identifying the faulty component. This performance of classification model.[1]-[2]
model used the limited testing resources for Wangshu Liu et al. used data pre-processing
prediction of the fault. approach which consists the two stage used for
Software metrics used for the detection of the identify the fault in software. Compare result
fault. Metrics compute from the source code. using the Nave Bayes, C4.5, IB1.[3] Rohit
Such metrics are size metrics, object-oriented Mahajan et al. proposed the framework for
metrics and complexity metrics. Different data software fault prediction using the Bayesian
mining techniques are used such as Neural Regularization (BR) and compare with
Network, Genetic programming, Nave Bayes Levenberg-Marquardt (LM) and Back
approach, artificial immune system, fuzzy logics, propagation (BPA). Bayesian regularization give
decision trees. Supervised learner used with better performance. [4] Gupta, Deepika, Vivek
pre-processing orwithout pre-processing give K. Goyal, and Harish Mittal Proposed Estimating
different result. Pre-processing of the data of Software Quality with Clustering Techniques.
increase the result of learning algorithm. This paper focus on clustering with very large
The benefits of the software defect detection dataset and very many attribute of different
model identify the fault before the testing so types. Effective result can be produced by using
improve the software testing and software fuzzy c-mean clustering. [5] Arashdeep Kaur et
quality by allocating more resource on al. Proposed model for software fault prediction.
fault-prone module. Public NASA datasets are In this paper, investigate the fuzzy c mean and
used for software defect detection.PROMISE k-mean performance. Fuzzy c mean is better
repository, includes several public NASA than k- mean for requirement and combination

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
40
metric model. Also investigate the metrics used feature belong to the coordinate. Then classify
in early life cycle can be used to predict fault the hyper plan to distinguish the class.
module or not.[6] Wangshu Liu et al. use the Matlab 2010 used for the classification
clustering based feature selection method for process. Waikato Environment for Knowledge
software fault prediction. FF-relevance and FF- Analysis (Weka) 3.17 used for the relevance
Correlation Measure use. Heuristics approach analysis.
use for cluster formation. [7] Issam H. Laradji et To evaluate the proposed framework we used
al. use the greedy forward feature selection and the real world software NASA and Eclipse
Average Probability Ensemble learning model is dataset. Dataset are freely available online.
to classify data. This model contain seven Software Dataset have software metrics as the
algorithm such as W-SVM, Random Forest etc. feature. The different software metrics are:
[8]

III. PROPOSED FRAMEWORK


The fig 1 shows the proposed framework of
system. The framework consist the two stage
first stage is relevance analysis and second
stage is redundancy control. Support vector
Machine used as classification model.
A. Relevance Analysis
The relevance analysis stage identify the
relevant feature from the dataset. For that
feature ranking method is used. Correlation
between the feature and class measure to find
the relevancy. For measuring the correlation
information gain (IG) use. Information gain
measure the amount of information provided by
the feature f, whether instance is fault or
non-fault. The formula used for measuring the
information gain is: [3]
IG f H A H A B (1)
Where H A compute the entropy of the
district random variable A (i.e. class).Consider p
(a) denote prior probability of a value aof A then Fig. 1. Framework of approach
H A compute by formula:
HA a Ap a log 2 p a (2) IV. DATASET AND TOOLS
H (A|B) compute the conditional entropy A. Size Metrics
whichquantifies the uncertainty of A given the 1. LOC(Line of code):Measure the
observed variable B. Consider p (a|b) denote productivity of the software related to length of
posterior probability of a for value b. H (A|B) the program. LOC count total length of the
compute by formula: program including comment, statement, Blank
H AB b B p b aA p a b log 2 p a b (3)
Space. i.e. the higher the LOC higher the bug
B. Redundancy Control density.
Redundancy control stage is used for remove 2. Number of Statement: Measure the number
the redundant feature. For remove the of statement in the program. Statement include
redundant feature Fuzzy C- means algorithm is branching statement such as if, switch ,looping
used. Fuzzy c-mean clustering moves the data statement such as while, for, do-while, break
centre iteratively to the right location. and continue statement, try-catch and finally
statement, method calls, return.
C.Classification Model 3. Number of Comment: Measure the number
Support vector machine (SVM) used as the of comment which contain both single and
classifier for the software defect detection. multiple line comment. Program consists the
Support vector machine is supervised learning 30% -75% comment then it is called good
algorithm. In SVM plot data item in n- program. If comment less 30% means program
dimensional space with the value of each is poorly explain.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
41
B. Complexity Metrics B. Area under ROC Curve (AUC)
1. Halstead Complexity: Halstead metrics Accuracy of the classifier means to correctly
measure the operator and operands of source predict the class label of new or unseen data.
code. Halstead metrics depend upon the actual Accuracy is percentages of testing set example
implementation of the program and its correctly classified into class. For measure the
measure, which compute from the operator and area under the curve Receiver operating
operands of source code. The measures are: characteristics (ROC) is plot. Receiver operating
N1=total number of operators characteristics (ROC) curve is graphical
N2=total number of operands representation of the performance of binary
n1=total number of distinct classifier. The curve is created by true positive
operators rate against the false positive rate at different
n2=total number of distinct threshold value. AUC is give better result for
operands software defect detection.
From this measure, several measure can be
calculated, program length, volume, difficulty C.Confusion Matrix
and effort. Confusion matrix used for measure the
2. McCabe Complexity: It is quantitative performance of classifier. Confusion table
measure of linear independent path from source represent the true value identify from the set of
test data. Confusion matrix has for basic
code. Control flow graph generated from the
categories which are True positive, True
program code. For a program control graph G, Negative, False Positive, False Negative. Table
Cyclomatic number(CC), is given as: I represent the confusion table. From the
CC = E N + P (4) confusion table another evaluation Measures
Where, E = number of edges Count which are:
N = number of nodes
1. Recall: It also called probability of detection
P = number of connected parts in graphs
or Sensitivity or true positive rate. Defined as the
C.Object oriented Metrics probability of correctly classified faulty module.

This metrics are used for object oriented Recall= (5)
+
language. 2. False Positive Ratio(FPR): It is also call
1. WMC(weighted method per class):Number fall-out. Defined as the ratio of false positive to
of method per class is count in this metrics. non- fault module.
2. DIT (Depth of Inheritance Tree): depth of FPR=

(6)
the class hierarchy is measure. Depth of the +

hierarchy is more than it is more complex to 3. Precision or positive predictive ratio (PPV):
predict class behavior. precision Measure the exactness of the
3. NOC (Number of children): This metrics classifier. It represent the percentage of the
measures number of direct subclass of the tuples that classifier labelled as faulty is actually
class. faulty.

4. CBO (Coupling between Object classes): PPV=
+
Measures the number of other classes that class
has coupled. Coupling between classes occur
via return types, method call and inheritance.
5. RFC (Response for a Class): This metrics (7)
count number of method executed by the class
object.
TABLE I. CONFUSION MATRIX
V. PERFORMANCE EVALUATION
Actual Class
The main aim of most classifiers is to perform
Fault Non fault
binary classification, i.e., Faulty or Non- Faulty.
True False
The perform measure use are accuracy, Predicted Fault
positive positive
confusion metrics, Area under the ROC curve.
Non False True
A. Accuracy fault negative negative
Accuracy of the classifier means to correctly
predict the class label of new or unseen data.
Accuracy is percentages of testing set example 4. True Negative Rate (TNR): True negative
correctly classified into class. rate measures the percentage of negatives that

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
42
are correctly identify as such.it is also called [5] Gupta, Deepika, Vivek K. Goyal, and Harish
specificity. Mittal. "Estimating of Software Quality with
TNR=

(8) Clustering Techniques," in Advanced
+ Computing and Communication
Technologies (ACCT), 2013 Third
VI. CONCLUSION International Conference on. IEEE, 2013.
In this paper software defect detection system [6] Kaur, Arashdeep, Amanpreet Singh Brar,
is implemented using Fuzzy C- Means and Parvinder S. Sandhu. "An empirical
clustering and Support vector machine. Pre- approach for software fault prediction," in
processing of data improve the Efficiency of the Industrial and Information Systems (ICIIS),
algorithm so feature ranking used ad pre- International Conference on. IEEE, pp.
processing of data. NASA and eclipse public 261-265, 2010.
[7] W. Liu, X. Chen, S. Liu, D. Chen, Q. Gu, J.
dataset are used for the system.
Chen, FECAR: A Feature Selection
Framework for Software Defect Prediction,
in proc.Int. Computers, Software &
REFERENCES
Applications Conference, pp. 426-435,
2014.
[1] K. Gao, T. M. Khoshgoftaar, H. Wang, and
[8] Issam H. Laradji, Mohammad Alshayeb,
N. Seliya, Choosing software metrics for
Lahouari Ghouti, Software defect
defect prediction: An investigation on
prediction using ensemble learning on
feature selection techniques,
selected features,, in Information and
Softw.-Practice Exper., vol. 41, no. 5, pp.
Software Technology, Vol. 58,pp. 388-402,
579606, 2011.
2015.
[2] Shivaji, E. J. W. Jr., R. Akella, and S. Kim,
[9] Gupta, Deepika, Vivek K. Goyal, and Harish
Reducing features to improve code
Mittal. "Estimating of Software Quality with
change-based bug prediction, IEEE Trans.
Clustering Techniques," in Advanced
Softw. Eng., vol. 39, no. 4, pp. 552569,
Computing and Communication
2013.
Technologies (ACCT), Third International
[3] J. Chen, S. Liu, W. Liu, X. Chen, Q. Gu, and
Conference on. IEEE, pp. 20-27, 2013.
D. Chen, Empirical Studies of a Two-Stage
[10] Ritikasharma, NehaBudhija,
Data Preprocessing Approach for Software
Bhupindersingh, Study of predicting Fault
Fault Prediction in IEEE Trans.on
Prone Software Modules, in International
Reliability, Vol. 65, pp. 38-53, 2016.
Journal of Advanced Research in Computer
[4] Rohit Mahajan, Sunil Kumar Gupta, Rajeev
Science and Software Engineering ,vol. 2,
Kumar Bedi, Design Of Software Fault
no. 2 pp.1-3, Feb 2012.
Prediction Model Using BR Technique, in
Proc. Int. Conf. Information and
Communication Technologies, Kochi, pp.
849-858, 2014.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
43
Eco-friendly polyester dyeing with Croton Oblongifolius
[1]Trupti
Sutar, [2]Ashwini Patil, [3]Prof. (Dr.) R. V. Adivarekar*
*Department of Fibres and Textile Processing Technology, Institute of Chemical Technology,
Matunga (E), Mumbai-400019, India

1
Abstract- The natural plant source Croton they can readily be applied on hydrophilic
Oblongifolius is mainly used by textile fibers. Thus there is less possibility to
pharmacologist for its medicinal properties produce the natural-dyed synthetic textiles to
like allergic dermatitis, decoction, to relieve fulfill a demand on a more eco-friendly textile
abdominal pains, anti-diarrheal, blood tonic products [2]. There are very few natural
etc. The present research work deals with sources available which can be used as dye for
the extract of the stem bark of this plant polyester fibre. In this research work, polyester
having particularly reddish-brown color dyeing is done with plant source: Croton
used to dye hydrophobic polyester, which is oblongifolius (Euphorbiaceae). As there is very
difficult to dye using natural dyes.
little scientific information present on
Optimization of dyeing parameters viz., pH
application of it for dyeing, an attempt has been
conditions, dye concentration, dyeing time
made consciously to use it in colouration of
and dyeing temperature was been studied.
The dyed polyester fabric showed synthetic fibres which can be used in medical
satisfactory results including excellent textile as dye is from medicinal plant [3].
sublimation fastness. This dyed Polyester C.Oblongofolis is a weed available all over in
with some natural fibers used in medical the agricultural fields of India. Pharmaceutical
textile. Findings show that the natural dye approach with this plant is widespread due to
extracted from stem bark of croton its medical properties. Traditionally this plant is
oblongofolius has good potential in the also used as wound healing drug.
polyester dyeing and can be exploit further. II. EXPERIMENTAL
A. Material-
Keywords- Croton Oblongifolius,
Polyester (100%) was procured from local
Polyester dyeing, Fastness properties.
manufacturer from Mumbai, India. Chemical
I. INTRODUCTION
used for extraction and dyeing i.e. ethanol was
With growing awareness on environmental- purchased from SDF Mumbai, India.
related issues, most of the industries prefer
natural dyes over synthetic dyes for textile B. Methods-
dyeing. In comparison between natural dyes i. Extraction of dye from C.Oblongofolis-
and synthetic dyes, synthesis of synthetic dyes The stem bark sample was collected from
produces undesirable, hazardous and toxic college campus which was then washed
chemicals where as natural dyes are obtained thoroughly with water to remove impurities. It
from renewable sources, biodegradable and was then dried at 400 C in an oven for 24 hrs.
less toxic. Demands of polyester fibre in the The sample was ground into powder with the
various industries are due to its durability and help of grinder. The process of extraction of
strength. They also resist to wrinkles, shrinking, dye was carried out in Soxhlet apparatus with
abrasion and mildew. Synthetic fibres suffer ethanol for 3-4 hrs at 780C. The mixture was
from disadvantages such as reduced wearing evaporated to dryness in a rotary evaporator
comfort, build-up of electrostatic charge, the flask and final powder product was kept in
tendency to pill, difficulties in finishing. These desiccator.
disadvantages are largely associated with their
hydrophobic nature [1]. Also polyester can only C. Optimization of Dyeing conditions-
be dyed with disperse dyes which limits the Dyeing was carried out using conventional
variations of dyes for polyester dyeing. Dyeing polyester and disperse dyeing method.
of textiles with natural dyes has long been Optimization of concentration of dye (1% to 17
studied, especially for natural fibers like cotton %), temperature (1000C to 1300C) and time (30
and silk but rarely with polyester fibers. Majority min to 60 min) was done using material to
of the natural dyes are well soluble in water, so liquor ratio 1:50. Finally the fabric samples
were washed with cold water, hot water,
Author would like to acknowledge, world band sponsor TEQIP-II
squeezed and dried.
for providing funds for the study.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
44
D. Determination of colour strength value of 1.5
dyed fabric-
The dyed samples were evaluated for the depth
1
of color by reflectance method using 10degree

K/S
observer. RayscanSpectrascan 5100+
equipped was used to measure the absorbance 0.5
of the dyed sample in terms of CIELAB colour
space (L*, a* and b*). The K/S values were
determined as follows; 0
1% 3% 5% 7% 9% 11% 13% 15% 17%
2
K 1 R Dye concentration
S 2R
(1)
Fig. 1. K/S values of the fabric dyed with
Where, R is the reflectance at complete different concentration of dye
opacity; ii. Effect of pH-
K is the Absorption coefficient & The extracted dye was acidic in nature and
S is the Scattering coefficient polyester dyeing requires acidic pH (between
In general, the higher the K/S value, the higher 4.5-5.5) so no additional chemicals are required
the depth of the colour on the fabric. L* for maintaining pH. At this pH dye, exhaustion
corresponding to the brightness (100- white, 0- is satisfactory [8].
black), a* to the redgreen coordinate (+ve-
red, -ve -green) and b* to the yellowblue iii. Effect of temperature-
coordinate (+ve -yellow, -ve -blue) [4], [5]. The The dyeing was conducted at different
reproducibility of the results was also checked temperatures i.e. at 100C, 110C and 130C.
and was found to be satisfactory in all the As shown in Fig.3, it is clear that the K/S value
cases, showing a standard deviation of the increases with the increase in dyeing
order of 0.02. temperature and reaches a maximum value at
130C. Heating is applied to increases the
E. Fastness Testing- energy of dye molecules in the dye liquor and
Colourfastness to washing was assessed as accelerates the dyeing of textile fibres. At
per ISO 105-CO2: 1989, Colourfastness to 130C temperature the molecular chains of the
Light on Q-Suns Xenon Arc Light Fastness polyester vibrates vigorously and polyester
Tester as per AATCC 117 2004, Colour goes from plastic to rubbery stage making it
fastness to Heat: ISO105P01:1993 [6]. accessible for dye and allows the dye to
2.6 Evaluation of UV Protection factor- penetrate into it thus the dye molecule occupy
UPF (Ultraviolet Protection Factor) of the its place in the amorphous regions of the fibre.
standard polyester fabric and C.Oblongofolis The dye molecules are held by hydrogen bonds
dyed polyester fabric were measured using and Van Der Waalsforce in its place [8].
AS/NZ 4399:1996 method by SHIMADZU UV-
2600 instrument [7]. 1.5

III. RESULTS & DISCUSSION


1
A. Optimization of dyeing parameters-
K/S

i. Effect of dye concentration-


The effect of dye concentration on the fabric 0.5
was studied, as seen from the fig.1, dye
concentration increases from 1% to 17 %, K/S
value also increases. From the below graph 0
15% dye concentration was taken as optimal 100C 110C 130C
dye concentration for polyester dyeing. With Temprature(C)
increase in concentration of dye in the dyebath
solubility increases until certain concentration
and then decreases. Adsorption of the dyestuff Fig 3.K/S values of the fabric dyed at different
on fibre surface depends on the solubility of the temperature
dye in the dye bath and that in the fibre [8].
iv. Effect of dyeing time-

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
45
As shown in Fig.4, the colour strength obtained
increases as the time increases from 30 mins Table I Sublimation fastness properties of dyed
to 60 mins. Time is the main factor in textile fabric
dyeing so from the below graph it is clear that
45 min is optimum time for polyester dyeing as Sublimation
on that treatment of dyeing colour strength is Sample 120 180 210C
highest and remains constant further. C C
2.5 At Acidic pH 5 5 5
100C 4 4 4
2 Temperatur 110C 4 3 4
e (C) 130C 5 5 5
1.5 30 mins 4 4 4
K/S

Time 45 mins 5 5 5
1 (Minutes) 60 mins 5 5 5
0.5 Dye 15% 5 5 5
Concentrati 17% 5 5 5
0 on
30 35 45 50 55 60 (%)
Time (Min)
Table II Light fastness properties of dyed fabric
Fig 4.K/S values of the fabric dyed at different
Sample Light
time Fastness
F. Fastness properties At Acidic pH 6
The colourfastness values of the fabrics dyed 100C 5
with C. Oblongofolis dye is given in Table I, II, Temperature (C) 110C 6
and III. Sublimation and washing fastness 130C 6
results were assess with respect to grey scale 30 mins 5
and results for light fastness were assess with Time (Minutes) 45 6
respect to blue wool scale. mins
Results of Sublimation fastness are excellent at 60 mins 6
120C, 180C and 210C temperature. Dye Concentration 15% 6
Similarly wash fastness and light fastness of (%) 17% 6
dyed polyester fabrics are giving acceptable
results.
Table III Washing fastness properties of dyed fabric

Sample Washing Fastness


Colour Change Colour Staining

AC C N P A W
At Acidic pH 5 4 4 4 4 4 4
100C 4 3-4 3-4 3-4 3-4 3-4 3-4
Temperature (C) 110C 4-5 3-4 4 3-4 3-4 4 3-4
130C 4-5 4 4-5 4 4-5 4-5 4-5
30 mins 4 3-4 4 3-4 3-4 3-4 3-4
Time (Minutes) 45 mins 4-5 4-5 4-5 4-5 4-5 4-5 4-5
60 mins 4 4-5 4 4-5 4-5 4-5 4
Dye Concentration 15% 5 4-5 4-5 4-5 4-5 4-5 4-5
(%) 17% 4-5 4-5 4-5 4-5 4-5 4-5 4-5
*AC-acetate,C-cotton,N-nylon,P-polyester,A-acrylic,W-wool,Temp-temprature
As shown in table IV, it is clear that in
comparison with control polyester sample the
G. UPF (Ultraviolet Protection Factor) testing
Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
46
dyed polyester fabric gives very good UPF
rating. There are various parameters which
affect the UV protection factor e.g. nature of
fibre, Dyeing, finishing, Moisture etc. The [2] Vorabodee Sriumaoum, Jantip Suesat,
polyester fabric is thin and transparent so the and Potjanart Suwanruji, Dyeing and
UVR (Ultraviolet radiation) can be easily Spectroscopic Properties of Natural
transmitted into skin. The dyed polyester shows Dyes on Poly (Lactic Acid) and Poly
high UPF than undyed polyester fabric due to (Ethylene Terephthalate) Fabrics
presence of natural colour pigment. As the inInternational Journal of Bioscience,
time, concentration and Temperature goes on Biochemistry and Bioinformatics, Vol.
increasing of polyester dyeing the UPF of that 2, No. 3, May 2012
dyed fabric also increases due to higher [3] Mandal L, Bose S Pharmacognostic
amount of dye in the fibre [9]. Standardization and Quantitative
Estimation of some Isolated
Table IV UV Protection Factor testing of dyed Phytoconstituents from Roxb. In
journal of PharmaSciTech 2011;
fabric
1(1):10-15
UPF [4] http://textilelearner.blogspot.in/2011/07/
Sample Undyed Dyed color-fastness-test-washing-
Fabric Fabric fastness_1059.html
At Acidic pH 33.59 [5] Priti B. Tayade, Ravindra V. Adivarekar
100C 12.71 Dyeing of Silk Fabric with Cuminum
Temperature 110C 19.59 Cyminum L as a Source of Natural
(C) 130C 33.92 Dye in International Journal of
30 mins 12.65 21.91 ChemTech Research, Vol.5, No.2, pp
Time 45 mins 33.90 699-706, April-June2013
(Minutes) 60 mins 32.23 [6] H. P. Gies, C. R. Roy and G. Holmes,
Dye 15% 33.86 Radiat . Prot. Dosim., 91, 247 (2000)
Concentration 17% 32.78 [7] IS 975-1988 Method for determination
(%) of colour fastness of textile material to
sublimation
[8] http://textilelearner.blogspot.in/2012/01/
dyeing-mechanism-of-disperse-dye-
IV CONCLUSION
dyeing.html
The main focus of the study is extraction of the
[9] D. Saravanan UV PROTECTION
dye and dyeing of polyester. Natural dyeing of
TEXTILE MATERIALS in AUTEX
polyester fabric is rare but C.Oblongofolius
Research Journal, Vol. 7, No 1, March
showed good dyeability to polyester. Dyeing
2007 AUTEX
can be carried out at acidic pH and without
mordant thus no additional chemicals were
required. Dyed fabric gave better sublimation
fastness. This natural dye seems to have great
potential for the dyeing of polyester being
environment friendly with medicinal properties.

REFERENCES
[1] M. P. Gashti, J. Willoughby, and P.
Agrawal, Surface and bulk
modification of synthetic textiles to
improve dyeability, in Textile Dyeing,
P. J. Hauser, Ed., chapter 13, InTech,
Rijeka, Croatia, 2011.View at Google
Scholar.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
47
Dengue disease prediction using Weka data mining tool
Kashish Ara Shakil, Samiya Khan, ShadmaAnis, and Mansaf Alam

Abstract: Dengue is a life threatening disease prevalent in several developed as well as


developing countries like India. In this paper we discuss various algorithm approaches of data
mining that have been utilized for dengue disease prediction. Data mining is a well-known
technique used by health organizations for classification of diseases such as dengue, diabetes
and cancer in bioinformatics research. In the proposed approach we have used WEKA with 10
cross validation to evaluate data and compare results. In this paper we have firstly classified the
dengue data set and then compared the different data mining techniques in weka through
Explorer, knowledge flow and Experimenter interfaces. Furthermore in order to validate our
approach we have used a dengue dataset with 108 instances but weka used 99 rows and 18
attributes to determine the prediction of disease and their accuracy using classifications of different
algorithms to find out the best performance. The main objective of this paper is to classify data and
assist the users in extracting useful information from data and easily identify a suitable algorithm
for accurate predictive model from it. From the findings of this paper it can be concluded that Nave
Bayes and J48 are the best performance algorithms for classified accuracy because they achieved
maximum accuracy= 100% with 99 correctly classified instances, maximum ROC = 1, had least
mean absolute error and it took minimum time for building this model through Explorer and
Knowledge flow results.

Index Terms: Classification, Dengue disease prediction, Data mining, Weka

I. INTRODUCTION liver and bone marrow leading to less blood


ENGUE fever is a disease caused by dengue circulation in blood vessels, and the blood
Dvirus and is also known as break bone fever pressure becomes so low that it cannot supply
is transmitted by Aedes mosquito. Infection of sufficient blood to all the organs of the body.
dengue is divided into four part DHF I, DHF II, The bone marrow also does not function
DHF II, DHF IV. It causes life threatening properly due to this infection leading to reduced
dengue hemorrhagic fever whose symptoms number of platelets and increased risk of
include bleeding, low levels of blood platelets, bleeding, which are necessary for effective
low blood pressure, metallic taste blood clotting.11
in mouth, headache, muscle joint pain and Bioinformatics research area uses weka tool
rashes[11],[22]. for solving many data mining problems. Weka
stands for Waikato Environment for Knowledge
There is no specific medicine or antibiotic Analysis developed at the university of
available to treat it. Dengue fever occurs in Waikato in New Zealand and was implemented
form of cycles and this cycle is in 1997 the software freely available at
http://www.waikato.ac.nz/ml/weka and written
present inside the body of an infected person in java language. There are several different
for two week or less than two week. It causes levels at which weka can be used. Weka
abdominal pain, hemorrhage (bleeding), and contains modules for data classification and
circulatory collapse and Dengue hemorrhagic accuracy to predict diseases.13Weka has been
fever. used in bioinformatics for diagnoses and
The following is the cycle mechanism by analysis of dengue disease datasets. Weka has
which dengue is transmitted: Aedes mosquito 49 tools for processing, 76 algorithms for
carries a virus in its saliva, when it bites a classification and regression, 8 algorithms for
healthy person, that virus enters the persons clustering, and 3algorithms for finding
body and gets mixed with the persons body association rules. Weka algorithms are suitable
fluids. The moment white blood vessels gets for generating predictive model accurately by
mixed with the single stranded RNA dengue extracting useful information from dengue data
virus, it starts reproducing inside the white set through WEKA[4].
blood cells and thus initiates the dengue virus Apart from weka researchers are now moving
cycle. In case of severe infection, the duration towards cloud computing for disease
of virus cycle is prolonged and thereby affects predictions [23], [24], [25], [28]. It also offers

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
48
facilities such as clustering and analysis of adopted by us. Furthermore section 4 presents
huge datasets [26], [27], [29], [30]. The main the Dengue Dataset used in this paper for
focus of this paper is dengue disease prediction analysis purpose; Section 5 describes data
using weka data mining tool and its usage for mining techniques used in this paper for
classification in the field of medical analysis; Result and Discussions are presented
bioinformatics. It firstly classifies dataset and in section 6, finally the paper ends with
then determines which algorithm performs best Conclusion and future works in section 7.
for diagnosis and prediction of dengue disease.
From the findings of the experiments conducted II. RELATED W ORK
it was revealed that Nave Bayes and J48 are Dhamodharan S. has done prediction of liver
the best algorithms. The posterior probability of disease using Bayesian Classification through
a hypothesis can be estimated using Bayesian Nave Bayes and FT tree algorithms. With the
reasoning of some given knowledge or data help of data mining techniques they have
[31].Prediction begins with identification of predicted and analyzed liver diseases using
symptoms in patients and then identifying sick weka tool. They have also compared the
patients from a lot of sick and healthy ones. outputs obtained from Nave Bayes and FT tree
Thus, the prime objective of this paper is algorithms and concluded that Naive Bayes
analysis of data from a dengue dataset algorithm plays a key role in predicting liver
classification technique to predict class diseases[1].
accurately in each case in data. The major Solanki A.V. has used weka as a data mining
contributions of this paper are: technique for classification of sickle cell disease
(1) To extract useful classified accuracy for prevalent in Gujarat. They have compared J48
prediction of dengue diseases. and Random tree algorithms and have given a
(2) Comparison of different data mining predictive model for classification with respect
algorithms on dengue dataset. to a persons age of different blood group
(3) Identify the best performance algorithm types. From there experimentation it can be
for prediction of diseases. inferred that Random tree is better algorithm as
In this paper we have used dengue dataset it produces more depth decisions respect to
for classification method. The steps followed J48 for sickle cell diseases[2].
include collection of the dataset for determining Joshi et al. has done diagnosis and
the accuracy, classification and then prognosis of breast cancer using classification
comparison of results. The dataset has been rules. By comparing classification rules such
used to classify the following dengue attributes as Bayes Net, Logistic, Multilayer Perceptron,
based on P.I.D, date of dengue fever, days, SGD,Simple Logistic, SMO, AdaBoostM1,
current temperature, WBC, joint muscles, Attribute Selected, Classification via
metallic taste in mouth, appetite, abdomen Regression,
pain, Nausea, diarrhea and hemoglobin. FilteredClassifier,MulticlassClassifier and
Several classification algorithms have been J48,They have inferred that LMT Classifier
used in this paper in order to analyze the gives more accurate diagnosis i.e. 76 %
performance of applied algorithm on the given healthy and 24 % sick patients[3].
dataset but the thrust in this paper is on David S.K. et al. have used classification
accuracy measure. Accuracy measures techniques for leukemia disease prediction. K-
analyze the errors through measures like root Nearest Neighbor, Bayesian Network, Random
mean square error, relative absolute error and tree, J48 tree compared on the basis of
correctly classified instances. accuracy, learning time and error rate.
Though data mining has several different According to them Bayesian algorithm has
algorithms to analyze data but analysis using better classification accuracy amongst
all the methods is not feasible therefore in this others[4].
paper we have performed the analysis using Vijayarani S. and Sudha S. have compared
Nave Bays, J48 tree, SMO function, REP Tree the analysis of classification function
and Random Tree algorithms by using techniques for heart disease prediction.
Explorer, Experimenter and knowledge flow Classification was done using algorithms such
interface of weka tool. The remainder of this as Logistic, Multilayer Perception and
paper is organized as follows. Section 2 Sequential Minimal Optimization algorithms for
presents related work done using data mining predicting heart disease. In this classification
tools such as weka and CRFSuiteto predict comparison logistic algorithm trained out to be
diseases; Section 3 describes the methodology best classifier for heart disease having more

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
49
accuracy and least error rate[5]. two data mining tools i.e. weka and CRF Suite
Kumar M.N. used alternating decision trees on the basis of features like Lexical, Syntactic
for early diagnosis of dengue fever. The and semantic with various parametersto
ADTreecorrectly classifies 84 % of cases as compare their impacts on each algorithm. The
compared to J48 which can classify only 78% experiments have been employed in CRFSuite
of cases correctly[6]. implementation by using Conditional Random
Durairaj M. and Ranjani V. have compared Field algorithm and in weka by algorithms like
different data mining applications in healthcare Support Vector machine and Random Forests
sector. Algorithms such as Nave, J48, KNN to identify discourse causality trigger in the
and C4.5 were used for Classification in order biomedical domain. Classification tasks have
to diagnose diseases like Heart Disease, been performed on the basis of statistics such
Cancer, AIDS, Brain Cancer, Diabetes, Kidney as F score, precision and recall. As per them
Dialysis, Dengue, IVF and Hepatitis C. CRF is the best performance
Comparison study analysis revealed high classifier,achieved F score = 79.35 % by
accuracy i.e. 97.77% for cancer prediction and combining three features as compared to other
around 70% for IVF treatment through data classifier[21].
mining techniques[7]. Thitiprayoonwonsge D. et al. have analyzed
Sugandhi C. et al. analyzed a population of dengue infection using data mining decision
cataract patients database by weka tool. In this tree. In this paper two datasets have been used
study, weka has been used to classify the from two different hospitals Srinagarindra
results and for comparison purpose. They have Hospital and Songklanagarind Hospital, each
concluded that Random Tree gives 84% having more than 400 attributes. Four
classify accuracy which means better classification algorithms have been used in this
performance as compared to other algorithms paper for experimental purpose. The first and
used for classification accuracy performance of second experiment test got an accuracy of
Nave Bayes, SMO, J48, REP Tree and 97.6% and 96.6%. The third experiment
Random Tree. Thus according to their study extracts useful knowledge. Another objective of
Random Tree is the best performance this paper was to detect day abatement of fever
classification algorithm for cataract patient also referred as day0. In fourth experiment of
disease[8]. day0 accuracy is very low as compared to other
Yasodha P. and Kannan M. performed three experiments. Therefore physician need
analysis of a population of diabetic patient day0 amongst patient in order to treat them
database using weka tool. They have classified [22].
the data and then outputs were compared by
using Bayes Network, REP Tree, J48 and III. METHODOLOGY
Random Tree algorithms. Finally the results In order to carry out experimentations and
conclude that these algorithms help to implementations Weka was used as the data
determine and identify the stage or state in mining tool. Weka (Waikato Environment for
which a of disease like diabetes is in by Knowledge Analysis) is a data mining tool
entering patients daily glucose rate and insulin written in java developed at Waikato. WEKA is
dosages thereby predicting and consulting the a very good data mining tool for the users to
patients for their next insulin dosage [9]. classify the accuracy on the basis of datasets
Bin Othman M.F. and Yau T.M.S. have by applying different algorithmic approaches
compared different classification techniques and compared in the field of bioinformatics.
using weka for Breast cancer. In this study they Explorer, Experimenter and Knowledge flow
have used different algorithm methods for are the interface available in WEKA that has
simulating results of each algorithm and its been used by us. In this paper we have used
training. They have simulated the errors by these datamining techniques to predict the
using Bayes Network, Radial Basis function, survivability of dengue disease through
Decision Tree and pruning and Single classification of different algorithms accuracy
Conjugation Rule Learner algorithms. From [12],[13].
their work it can be concluded that Bayes The interface of WEKA Datamining toolhas
Network performs best for breast cancer data. four applications:
Its time taken to build model is 0.19 second and (1)Explorer: The explorer interface has
accuracy 89.7 % and least error at 0.2140 as several panels like preprocess, classify, cluster,
compared to other algorithms used[10]. associate, select attribute and visualize. But in
Mihaila C. and Ananiadou S. have compared this interface our main focus is on the

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
50
Classification Panel[12]. and thus a comparative result was obtained.
(2) Experimenter: This interface provides The third technique uses Knowledge Flow. In
facility for systematic comparison of different this study we classified the accuracy of different
algorithms on basis of given datasets. Each algorithms Nave Bayes, SMO, J48, REP Tree
algorithm runs 10 times and then the accuracy and random Tree on different data sets and
reported [12]. compared the results to know which algorithm
(3)Knowledge Flow: It is an alternative to the shows best performance. In order to predict
explorer interface. The only difference between Dengue Disease for survivability by user one
this and others is that here user selects Weka can select this weka component from toolbar,
component from toolbar and connects them to place them in a layout like manner and connect
make a layout for running the algorithms[12]. its different components together in order to
(4) Simple CLI: Simple CLI means command form a knowledge flow web for preprocessing
line interface. User performs operations and analyzing data.
through a command line interface by giving All the algorithms used by us were applied to
instructions to the operating system. This a dengue data set explained in detail in section
interface is less popular as compared to other 4. In order to obtain better accuracy 10 fold
three. cross validation was performed. For each
classification we selected training and testing
A. Classification
sample randomly from the base set to train the
In data mining tools classification deals with model and then test it in order to estimate the
identifying the problem by observing classification and accuracy measure for each
characteristics of diseases amongst patients classifier. The thrust classifications and
and diagnose or predict which algorithm shows accuracy used by us are:
best performance on the basis of WEKAs 1) Correctly Classified Accuracy
statistical output[20]. It shows the accuracy percentage of test that
Table 1 shows the WEKA data mining is correctly classified.
techniques that have been used in this paper 2) Incorrectly Classified Accuracy
along with other prerequisites like data set It shows the accuracy percentage of test
format etc. by using different algorithms. that is incorrectly classified.
3) Mean Absolute Error
TABLE 1 It shows the number of errors to analyze
W EKA DATA MINING TECHNIQUE BY USING algorithm classification accuracy.
DIFFERENT ALGORITHMS 4) Time
Software Datasets Weka Data It shows how much time is required to build
ClassificationOperating DatasetPurpose
Mining Algorithms System File model in order to predict disease.
Technique Format 5) ROC Area
WEKA Dengue Explorer Nave Windows CSV Classification
Receiver Operating Characteristic19 represent
Experimenter Bayes 7 test performance guide for classifications
J48
accuracy of diagnostic test based on: excellent
SMO
REP (0.90-1), good (0.80-0.90), fair (0.60-0.70), poor
Random (0.60-0.70), fail (0.50 0.60).

Three techniques have been adopted in this IV. DATASETS USED


paper, the first technique uses explorer Dataset is a collection of data or a single
interface and depends on algorithms like Nave statistical data where every attribute of data
Bayes, SMO, J48, REP Tree and RANDOM represents variable and each instance has its
Tree, used in areas to represent, utilize and own description. For prediction of dengue
learn the statistical knowledge and significant disease we used dengue data set for prediction
results have been achieved. and classification of algorithms in order to
The second technique uses Experimenter compare their accuracy using weka's three
interface. This study allows one to design interfaces: Explorer, Experimenter and
experiments for running algorithms such as Knowledge Flow.
Nave Bayes, J48, REP Tree and RANDOM Figure1 shows a description of dengue dataset.
Tree on datasets. These algorithms can be run The dataset used by us contains 18 attributes
on experimenter and analyze the results. It and 108 instances for dengue disease
configures the test option to use cross classification and accuracy. We have applied
validation 10 folds. This interface provides different algorithms using WEKA data mining
provision for running all the algorithms together tool for our analysis purpose.14Table 2.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
51
Describes the attributes of data set which are CSV file along with its graphical view[12],[15].
presented in Figure 1. The file format of 1) Nave Bayes
datasets used is Comma Separated Value Nave Bayes is one of the algorithms that
CSV. Each attribute shows the present absent works as a probabilistic classifier of all
of dengue symptoms, number of days, date, attributes contained in data sample individually
number of WBC, number of platelets, pain and and then classifies data problems. Running the
taste among patients in different cities and how algorithms using Nave Bayes we analyze the
many days they suffers. classifier output with so many statistics based
output by using 10 cross validation to make a
prediction of each instance of the dataset[16].
After running these algorithms we achieved a
classification accuracy of 100% for 99 correctly
classified instances, error rates achieved i.e.
Mean Absolute Error is 0.0011, time taken for
building model is 0 seconds and ROC area is
1these outputs are obtained after these
algorithms are run .

Fig. 1. Screenshot view of Dengue Dataset


Table 2
DESCRIPTION OF DATASETS ATTRIBUTES
Attributes Description
P.I.D Patient ID
Date of fever Month
Residence City
Days No. of days
Current Fever
Temperature
WBC No. of WBC Fig. 2. Screenshot view for Nave Bayes Algorithm
Severe Yes or No
Headache Behind Eyes TABLE 3
Pain Yes or No NAVE BAYES ALGORITHM ACCURACY
Algorith Time Correctly Incorrectl Mean RO
Joint / Muscle Yes or No
m Taken to Classified y Absolut C
pain Build Instances Classified e Are
Metallic Taste Model %AccuracInstances Error a
Appetite Yes or No (second y %Accurac
Abdominal pain Yes or No s) y
Nausea/Vomiting Yes or No Nave 0 100% 0% 0.001
Diarrhea Yes or No Bayes (99) (0) 1 1
Hemoglobin Hemoglobin
Hematocrit Range The output obtained by scoring of
Platelets Hematocrit NaveBayes algorithm accuracy of is given in
Dengue Range Table 3 on the basis of time, accuracy, error
No. of Platelets and ROC.
Yes or No 2) J48 Tree
J48 Tree has been used in this paper to
A. Explorer Interface decide the target value based on various
attributes of dataset to predict machine learning
It first preprocesses the data and then filters
model and classify their accuracy. We have
the data. Users can then load the data file in
also used J48 Tree on our dengue disease
CSV (Comma Separated Value) format and
dataset. After running this algorithm we
then analyze the classification accuracy result
analyzed the outputs obtained from the
by selecting the following algorithms using 10
classifier, the output gave several statistics
cross validation: Nave Bayes, J48, SMO, REP
based on 10 cross validation to make a
Tree, and Random Tree. The interface of
prediction of each instances of dataset. Figure
explorer using dengue dataset is opened using

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
52
3 shows the classification accuracy achieved
from this algorithm i.e. 100% is the correctly
classified accuracy for a batch of 99instances,
mean absolute error obtained is 0, time taken
to build this model is 0 seconds, and ROC area
is 0.958.

Fig. 4. Screenshot view SMO Algorithm


Scoring of J48 Algorithm Accuracy is given in
Table 5 on the basis of time, accuracy, error
and ROC.
Fig. 3. Screenshot view J48 Algorithm TABLE 5
Scoring by using J48 algorithm accuracy is SMO ALGORITHM ACCURACY
given in Table 4 on the basis of time, accuracy, Algorit Time Correctl Incorrec Mean ROC
error and ROC. hm Taken y tly AbsoluArea
TABLE 4 to ClassifieClassifiete
J48 ALGORITHM ACCURACY Build d d Error
Model Instance Instance
Algorith Time Correctly IncorrectlMean ROC (secon s s
m Taken Classifie y Absolu Area ds) %Accur %Accur
to d Classifie te acy acy
Build Instances d Error SMO 0 100% 0% 0 0.8
Model %Accura Instance (99) (0) 75
(second cy s
s) %Accura 4) REP Tree
cy REP Tree has been used in this paper to
J48 0 100% 0% 0 0.958 build a decision and reduces errors by sorted
(99) (0)
values of numeric attribute and splits the
instances into pieces to classify the accuracy.
3) SMO Running the algorithm we analyze the classifier
SMO is one of the methods used for
output with statistics based outputs by using 10
classification. In this paper we have used this
cross validation to make a prediction of each
algorithm to split the data on the basis of
instance of dataset.
dataset. Running this algorithm we analyzed
In figure 5 classification accuracy achieved
the classifier output with different statistics
shows that 74.7475 % are correctly classified
based on output by using 10 cross validation to
accuracy for 74 instances, 25.2525 %
make a prediction of each instances of dataset.
incorrectly classified accuracy for 25 instances,
Figure 4 shows the classification accuracy of
error rates that is mean absolute error is
100%, error rates that is mean absolute error
0.3655, time taken to build model is 0.02
obtained is 0, time taken to build model is 0
seconds and ROC area is 0.544 these are
seconds and ROC area is 0.875 that is
mentioned in output.
obtained after running these algorithms.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
53
Fig. 6. Screenshot view of Random Tree Algorithm
Scoring output obtained by using random
Fig. 5. Screenshot view of REP Tree Algorithm tree algorithm accuracy is given in Table 7 on
the basis of time, accuracy, error and ROC.
Figure 5 analyzes scoring of REP Tree
Algorithm Accuracy and the output is TABLE 7
elaborated further in Table 6 on the basis of RANDOM TREE ALGORITHM ACCURACY
time, accuracy, error and ROC. Algorit Time Correctl IncorrectMean ROC
hm Taken y ly Absol Area
TABLE 6 to Classifie Classifie ute
REP TREE ALGORITHM ACCURACY Build d d Error
Algorit Time Correctl IncorrectMean ROC Model Instance Accurac
hm Taken y ly Absol Area (secon s y
to Classifie Classifie ute ds) %Accura
Build d d Error cy
Model Instance Instance Rando 0 87.878 12.121 0.18 0.8
(secon s s m 8% 2% 53 76
ds) %Accura%Accura Tree (87) (12)
cy cy
REP 0.02 74.747 25.252 0.36 0.5
B. Experimenter Interface
Tree 5% 5% 55 44
(74) (25) Experimenter Interface has been used in this
paper to analyze data by experimenting
through algorithms such as Nave Bayes, J48,
5) Random Tree
REP Tree and Random Tree to classify the
Random Tree has been used in this paper
data using train and test sets[17].
for randomly choosing k attributes at each node
Wehave run four different algorithms on
to allow the estimation of class probabilities.
dengue datasets and analyze algorithms
Running the algorithm we analyze the classifier
accuracy.
output with statistics based output by using 10
1) Nave Bayes
cross validation to make a prediction of each
It is one of the fastest algorithm works on
instances of dataset.
probability of all attribute contained in data
From figure 6 classification accuracy of
sample individually and then classifies them
87.8788% is obtained, it is correctly classified
accurately.
accuracy for 87 instances, 12.1212%
2) J48 Tree
incorrectly classified accuracy for 12 instances,
We used J48 tree to decide the target value based on
error rates that is mean absolute error is
various attribute of dataset to predict algorithms accuracy.
0.1853, time taken to build this model is 0
3) REP Tree
seconds, and ROC area is 0.876 these are We used Weka classifier tree algorithm
mentioned in output. analyze accuracy applied on dengue dataset.
4) Random Tree
We used Random classifier tree algorithm to
analyze classification based on our
dataset.Figure 7 analyzes experiment test of all
four algorithms, each algorithm is run 10 times

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
54
and accuracy is reported. v stand for best seconds and ROC area is 1 these are
accuracy prediction and * stand for worse mentioned in output.
accuracy prediction. This means it predicts best
and worse scoring accuracy amongst the four
different algorithms listed below respectively:
Nave Bayes
J48 Tree
REP Tree
Random Tree

Fig. 8. Screenshot view of Nave Bayes


Algorithm

Figure 8 analyses scoring of Nave Bayes


algorithm accuracy given in Table 9 on the
basis of time, accuracy, error and ROC are the
Fig. 7. Screenshot view of Experimenter following:
Algorithm Accuracy TABLE9
NAVE BAYES ALGORITHM ACCURACY
Scoring accuracy of Nave Bayes, J48, REP Algorithm Time Correctly Incorrectly Mean ROC
Tree andRandom Tree is given in Table 8 Taken to Classified Classified AbsoluteArea
Build Instances Instances Error
TABLE 8 Model %Accuracy%Accuracy
EXPERIMENTER ALGORITHMS ACCURACY (seconds)
Algorithm Best Worse Nave 0 100% 0% 0.0011 1
Accuracy Accuracy Bayes (99) (0)
Prediction (v) Prediction (*)
Nave Bayes 100% _ 2) J48 Tree
J48 Tree 99.70% _ J48 Tree decides the target value based on
REP Tree _ 75.70% various attributes of dataset to predict
Random _ 91.03% machinelearning model and classify their
Tree accuracy on the basis of dengue disease
dataset.
C. Knowledge Flow Interface In figure 9 classification accuracy achieved
shows that 100% are correctly classified
Knowledge Flow is an alternative to the accuracy out of 99 instances , error rates that is
explorer.18the user lays out the data by mean absolute error is 0, time taken to build
connecting them together in order to form a model is 0 seconds and ROC area is 1 these
knowledge flow by selecting weka component
are mentioned in output.
from a tool bar.For the purpose of our
experimentation we have connected together
CSV loader, class assigner, Cross validation,
and then an algorithm such as SMO, REP tree
etc followed by Classifier Performance
evaluator and finally we view the output using
text viewer.
1) Nave Bayes
Nave Bayes is the algorithm works on
probability of all attribute contained in data
sample individually and classify data problems.
We used this algorithm in Experimenter
interface also.
In figure 8classification accuracy achieved is
100% correctly classified accuracy for 99
instances, error rate that is mean absolute error
is 0.0011, time taken to build model is 0

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
55
Fig. 9. Screenshot view of J48 Tree Algorithm
4) REP Tree
A detailed analysis of scoring using J48 REP Tree has been used in this paper to
Algorithm is given by Table 10 on the basis of build a decision tree and thereby reduce errors
time, accuracy, error and ROC. by sorting values of numeric attributes and
TABLE 10 splits instances into pieces to classify the
J48 TREE ALGORITHM ACCURACY accuracy.
Algorit Time Correctly IncorrectMean RO Figure 11shows that classification accuracy
hm Taken Classifie ly AbsoluC achieved is 74.7475% correctly classified
to d Classifie te Are accuracy, 25.2525% are incorrectly classified
Build Instance d Error a accuracy, error rates that is mean absolute
Model s Instance error is 0.3655,time taken to build model is 0.02
(second %Accura s and ROC area is 0.544.
s) cy %Accura
cy
J48 0 100% 0% 0
(99) (0) 1

3) SMO
SMO algorithm has also been used by us in
knowledge flow interface for classification. It
splits the data on the basis of dataset and then
analyses the output. From figure 10 we can
deduce that classification accuracy achieved Fig. 11. Screenshot view of REP Tree
gives 100% correctly classified accuracy out of Algorithm
99 instances, error rates that is mean absolute
error is 0, time taken to build model is 0 Table 12shows the analysis of scoring of
seconds, and ROC area is 0.875. REP tree algorithm accuracy on the basis of
time, accuracy, error and ROC.
TABLE 12
REP TREE ALGORITHM ACCURACY
Algorithm Time Correctly Incorrectly Mean ROC
Taken to Classified Classified AbsoluteArea
Build Instances Instances Error
Model %Accuracy %Accuracy
(seconds)
REP 0.02 74.7475% 25.2525% 0.3655 0.544
Tree (74) (25)
Fig. 10. Screenshot view of SMO Algorithm

Scoring of SMO algorithm accuracy is shown 5) Random Tree


by Table 11 on the basis of time, accuracy, Random Tree randomly chooses attributes at
error and ROC. each node to allow the estimation of class
accuracy.From figure 16 we can observe that
TABLE 11 classification accuracy achieved is 87.8788%
SMO ALGORITHM ACCURACY correctly classified accuracy, 12.1212% are
Algorit Time CorrectlyIncorrect Mean ROC incorrectly classified accuracy,error rates that is
hm Taken Classifie ly Absolu Area mean absolute error is 0.1853, time taken to
to d Classifie te build model is 0 seconds and ROC area is
Build Instance d Error 0.876 these are mentioned in output.
Model s Instance
(secon %Accura s V. RESULTS AND DISCUSSION
ds) cy %Accura
Explorer, Experimenter and Knowledge flow
cy
are the data mining techniques that have been
SMO 0 100% 0% 0 0.87
used by us using different algorithms Nave
(99) (0) 5
Bayes, J48, SMO, RANDOM tree and REP
tree. Through these techniques we trained out
results on the basis of time taken to build

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
56
model, correctly classified instances, error and In Table 15Nave Bayes and J48 classified 100
ROC area. Algorithm scoring accuracy is % correctly instances accuracy with minimum
shown in Table 14. Nave Bayes Mean Absolute Error = 0.011 and
Figure 12 analyses scoring of J48 algorithm Nave Bayes Mean Absolute Error J48=0,
accuracy which has been illustrated further in having maximum Nave Bayes ROC =1 and
Table 13 on the basis of time, accuracy, error J48 ROC Area = 0.958 and time taken to build
and ROC.Nave Bayes and J48 classified 100 model=0 seconds. So from Knowledge flow
% correctly instances accuracy with minimum Interface datamining technique result Nave
Nave Bayes Mean Absolute Error = 0.011 and Bayes and J48 have maximum accuracy, least
Nave Bayes Mean Absolute Error J48=0, error, less time taken to build model and
having maximum Nave Bayes ROC =1 maximum ROC. Explorer and Knowledge flow
achieved same scoring to classify accuracy but
there is approx. change in ROC Value of Nave
Bayes and J48 as compared to other because
Knowledge flow is an alternative method of
Explorer.

TABLE 15
KNOWLEDGE FLOW RESULT
Algorithm Time Correctly Incorrectly Mean ROC
Taken to Classified Classified AbsoluteArea
Build Instances Instances Error
Model %Accuracy %Accuracy
(seconds)
Fig. 12. Screenshot view of Random Nave Tree 0 100% (99) 0% (0) 0.0011 1
Algorithm Bayes 0 100%(99) 0% 0 1
J48 0 100%(99) 0%(0) 0 0.875
TABLE 13 SMO 0.02 74.74%(74) 25.25%(25) 0.3655 0.544
RANDOM TREE ALGORITHM ACCURACY REP 0 87.87%(87) 12.12%(12) 0.1853 0.876
Algorit Time Correctl IncorrectMean Tree ROC
hm Taken y ly Absol Area
to Classifie Classifie ute In Table 16 Nave Bayes and J48 scoring
Build d d Error accuracy is high that is best prediction (V) as
Model Instance Instance compared to REP Tree and Random Tree
(secon s s having low algorithm accuracy called worse
ds) %Accura%Accura
prediction (*).
cy cy
TABLE 16
Rando 0 87.878 12.121 0.18 0.8
EXPERIMENTER RESULT
m 8% 2% 53 76
Algorithm Best Accuracy Worse
Tree (87) (12)
Prediction (v) Accuracy
Prediction
and J48ROC Area = 0.958 and time taken to (*)
build model=0 seconds. So from Explorer Nave Bayes 100% _
Interface data mining technique we can deduce J48 Tree 99.70% _
that Nave Bayes and J48 have maximum REP Tree _
accuracy, least error and it takes less time to Random _ 75.70%
build model it and has maximum ROC. Tree 91.03%
TABLE 14
EXPLORER RESULT Finally from these three data mining technique
Algorithm Time Correctly Incorrectly Mean ROC
Taken to Classified Classified Absolute Area it is observed that Nave Bayes and J48 are
Build Instances Instances Error the best classifier performance to predict the
Model %Accuracy %Accuracy
(seconds) survivability of dengue disease prediction
NaveBayes 0 100%(99) 0%(0) 0.0011 1 among patient using WEKA because it
J48 0 100%(99) 0%(0) 0 0.958
SMO 0 100%(99) 0%(0) 0 0.875 classifies more accurately, has maximum ROC
REP Tree 0.02 74.74%(74) 25.25%(25) 0.3655 0.544 Area, least mean absolute error and takes
Random 0 87.87%(87) 12.12%(12) 0.1853 0.876
Tree minimum time to build model . The Accuracy of
test depends on dataset with and without
disease. Accuracy measured by ROC area = 1

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
57
shows a perfect and excellent test as Patient is huge or in case of very large data sets
will get effective diagnosis timely and in an spanning lakhs of patients. Even though Weka
accurate manner. is a powerful data mining tool to analyze the
overview of classification, clustering,
VI. CONCLUSION AND FUTURE WORK Association Rule Mining and visualization of
The main aim of this paper is to predict dengue result in medical health to predict disease
disease using WEKA data mining tool. It has among patient but we can use other tools such
four interfaces. Out of these four we have used as Matlab in order to further classify different
three interfaces: Explorer, Experimenter and data sets .The proposed approach is used with
knowledge flow. Each interface has its own dengue data set but we plan to extend this
classifier algorithms. We have used five approach in future for prediction of other
algorithms i.e. Nave Bayes, J48, SMO, REP diseases such as cancer etc.
Tree and Random tree for our experimentation.
Then these algorithms were implemented using REFERENCES
WEKA data mining technique to analyze [1] Dhamodharan S , Liver Disease Prediction
algorithm accuracy which was obtained after Using Bayesian Classification , Special
running these algorithms in the output window. Issues , 4th National Conference on
After running these algorithms the outputs were Advance Computing , Application
compared on the basis of accuracy achieved. Technologies, May 2014
[2] SolankiA.V.,Data Mining Techniques using
In Explorer and Knowledge flow there are
WEKA Classification for Sickel Cell
several scoring algorithms for accuracy but for Disease, International Journal of Computer
our experimentation we have used only five Science and Information Technology,5(4):
algorithms. The outputs obtained from both 5857-5860,2014.
Explorer and Knowledge flow is approximately [3] Joshi J,Rinal D, Patel J, Diagnosis And
same because knowledge flow is an alternative Prognosis of Breast Cancer Using
method of Explorer. It is just a different way of Classification Rules, International Journal
carrying out experimentations. These of Engineering Research and General
algorithms compare classifier accuracy to each Science,2(6):315-323, October 2014.
other on the basis of correctly classified [4] David S. K., Saeb A. T., Al Rubeaan K.,
Comparative Analysis of Data Mining Tools
instances, time taken to build model, mean
and Classification Techniques using WEKA
absolute error and ROC Area. in Medical Bioinformatics, Computer
Through Explorer and Knowledge Flow Engineering and Intelligent
technique it was inferred that Nave Bayes and Systems, 4(13):28-38,2013.
J48 are the best performance classifier [5] Vijayarani, S., Sudha, S., Comparative
algorithms as they achieved an accuracy of 100 Analysis of Classification Function
%, takes less time taken to build and shows Techniques for Heart Disease
maximum ROC area = 1, and had least Prediction, International Journal of
absolute error. Maximum ROC Area means Innovative Research in Computer and
excellent predictions performance as compared Communication Engineering, 1(3): 735-
741, 2013.
to other algorithms.
[6] Kumar M. N.,Alternating Decision trees for
Experimenter result showed that scoring early diagnosis of dengue fever.arXiv
accuracy of Nave Bayes is 100% and J48 is preprint arXiv:1305.7331,2013.
99.70% as compared to REP tree and Random [7] Durairaj M, Ranjani V, Data mining
tree so we can conclude that in Experimenter applications in healthcare sector a study.
interface Nave Bayes and J48 are the best Int. J. Sci. Technol. Res. IJSTR,
classifier algorithms for accuracy predictions for 2(10),2013.
dengue disease survivability on the basis of [8] Sugandhi C , Ysodha P , Kannan M ,
symptoms given in dataset among patients. Analysis of a Population of Cataract
The applications of Weka can be extended Patient Database in WEKA Tool ,
International Journal of Scientific and
further to medical field for diagnosis of different
Engineering Research ,2(10) ,October
diseases like cancer and many others. It can ,2011.
also help in solving the problems of clinical [9] Yasodha P, Kannan M, Analysis of
research using different applications of Weka. Population of Diabetic Patient Database in
Another advantage of using Weka for prediction WEKA Tool , International Journal of
of diseases is that it can easily diagnose a Science and Engineering Research,2 (5),
disease even in case when the number of May 2011.
patients for whom the prediction has to be done

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
58
[10] Bin Othman M. F , Yau, T. M. S., [16] Wikipedia,
Comparison of different classification http://en.m.wikipedia.org/wiki/Naive_Bayes
techniques using WEKA for breast cancer, _classifier,accessed in January 2015.
In 3rd Kuala Lumpur International [17] Hall M, Reutemann P, WEKA Knowledge
Conference on Biomedical Engineering Flow Tutorial for version 3-5-8,July 2008.
2006, Springer Berlin Heidelberg, 520- [18] ScuseD,Reutemann P,WEKA
523,January 2007. Experimenter Tutorial for version 3-5-5,
January 2007.
[11] Wikipedia, [19] http://gim.unmc.edu/dxtests/roc3.htm,
http://en.m.wikipedia.org/wiki/Dengue_feve accessed in February 2015.
r, accessed in January 2015. [20] Wikipedia,
[12] Wikipedia, http://en.m.wikipedia.org/wiki/Statistical_Cl
http://en.m.wikipedia.org/wiki/weka assification, accessed in January 2015.
(machine learning),accessed in January [21] Mihaila C, AnaniadouS,Recognising
2015. Discourse Causality Trigger in the
[13] Waikato, Biomedical Domain , Journal of
http://www.cs.waikato.ac.nz/ml/weka,acces Bioinformatics and Computational Biology
sed in January 2015. ,11(06),October 2013.
[14] Wikipedia, [22] Thitiprayoonwongse D., Suriyaphol P.,
http://en.m.wikipedia.org/wiki/Data_set,acc Soonthornphisaj N., Data mining of
essed in January 2015. dengue infection using decision tree,
[15] KirkbyR, Frank E,WEKA Explorer User Entropy, 2: 2, 2012.
Guide for version 3-4-3, November2004.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
59
Texture Analysis Methods: A Brief Review
[1]Soumya Ranjan Nayak, [1] Rajalaxmi Padhy, [2] Jibitesh Mishra
[1],[1], [2] College of Engineering and Technology
[1],[1]Department of Information Technology
[2]Department of Computer Science and Application

Abstract: Texture analysis is most emerging research topic in the field of image processing that
finds significant application in different fields including image processing in terms of feature
extraction, texture classification and much more. This paper discusses and review the various
approaches of texture analysis based on the literature availability and research work either carried
out or supervised by the different researchers so far. This review has been prepared depending upon
the various methods and technique used for texture analysis is addressed. Here all the existing
texture analysis algorithms are carefully studied and each categorized methods, models of texture,
feature extraction technique and texture classifications are outlined.

Index Terms: Texture, Classification, Feature Extraction, Pattern

I. INTRODUCTION tonal variation with in a band and contextual


Analysis of texture is most useful study of informanation is derived from the blocks of
research in domain of image processing and picture data surrounds the area being analyzed.
computer vision because it has no strict Texture is only concerned with the spatial
definition profound in this regard so far. Different distributions of gay tones. Textures are
researchers used different definition based on categorized into many types such as fine,
the particular area of application described in [1]. smooth, irregular, rippled, rough etc. Texture
Image texture are represented by means of contains the major informanation about the
variation of pixels in terms of repeated pattern structural arrangement of surface and their
[1-2].A texture space in digital domain is association, since from the important
represented by means of irregular dissemination informanation we can derive different feature
of pixel points in spatial coordinates described in from texture [6]. However, images can be build
[3]. Generally texture are comprises composite upon pixels, similarly texture can be comprises
patterns that made by grouping sub patterns of of commonly relevant pixels and bundle of
different feature like brightness, color and size pixels. This bundle (group) of pixels is treated as
etc, therefore more generally it can be described texture primitives or elements of texture called
in terms of similarity grouping defined in [4]. taxels, These texels are arrange in accurate
These local sub patterns of an image describe manner to create a valid object, for example if an
the property in order to determined the content intensity variation in texture image are appears
of texture and these properties are categorized to be perfectly periodic then it is said to be
in terms of roughness, smoothness, fineness, periodic pattern not texture. However if the
randomness, lightness, uniformity, phase, texture contains any completely random pattern
density, granularity of the texture, regularity, then we can say it is a noise pattern rather than
linearity, frequency, directionality, coarseness texture but if the texture image pattern has both
and many more described by [5]. It has been regularity and randomness the probably it would
difficult for human eye for judgment of different be called texture. There are four important
kind of texture image, Therefore a successful issues are taken into consideration for any
vision system is required to realize this texture to texture analysis [7] i.e.
the world surrounding it [1], Hence the Feature extraction: Computation of
importance of texture analysis and texture characteristics and vector of mathematical
research are become popular to understand, parameter for representing properties of texture.
model and process texture using computer Texture classification: It determines the
technology. There are mainly three major predefined classes given texture it belongs.
features through which a human being interprets Texture segmentation: Partition of image into
pictorial informanation, such as spectral, textural disjoint region containing homogeneous texture.
and contextual. Spectral informanation in an Shape identification: To renovate three
image can be defined as the average tonal dimensional surfaces from texture
variation in various bands. Textural informanation.
informanation yields the spatial distribution of

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
60
II. TEXTURE ANALYSIS element or tries to extract the placement rule
Analysis of texture is an emerging research area that describes the texture. The structure and
in the field of imaging science to describing the organization of the primitives can also be
characteristic of texture features. Texture presented using voronoi tessellations [1][20].
analysis is also a mathematical representation E. Signal Processing Approach
or models to represents the spatial variation with
In this technique the image is usually filter with
the image plane in terms of information
a pool of filters like laws masks, local linier
extraction. Texture is provincial construct to
transforms presented by unser and eden [21]
describe local spatial association of spatially
and the other masks designed edge detection
varying spectral values in an area of larger
methods are used to capturing frequency) of
spatial scale in a repeated region. Hence, the
differing scales in order to capture the frequency
observation of texture is a utility of spatial and
changes [22-27].
radiometric scales. According by different
researcher [1, 7-9] different texture analysis F. Transform Model
technique are evolved, approaches to texture In this model the analysis was made in terms of
analysis are fall into five category that is furrier proposed by Rosen feld [4], gobor [22,28]
statistical, structural, model based, geometrical, and wavelet transforms [29-31] methods and it
signal processing, and transform model. was represented an image in a space whose
However, it is not practical to provide an coordinate system has an analysis that is strictly
exhaustive survey of all texture measures here. correlated to the distinctiveness of a texture
A. Statistical Approach such as size or frequency. The wavelet
transform are widely used in case of texture
Statistical approaches measures analyses the
segmentation but the problem with this method
spatial spreading of the pixels value, by
is that, it is not translation invariant [32-33].
estimating the local features at every point in the
This review is confined mainly to statistical
image, and finding statistical parameter from the
approaches for texture analysis and feature
spatial spreading of pixel [10]. This approach is
extraction techniques.
appropriate if texture primitive sizes are
comparable with the pixel sizes.
III. TEXTURE CLASSIFICATION
B. Structural Approach Texture can be represented by means of
Structural methods describe the texture by spatial spreading of pixel or a selected region of
means of texture primitives and order of spatial interest of particular image plane and it was
organizations of those primitives [2] [5]. These considered to be repeating patterns of local
approaches give favorable symbolic explanation variation of pixel intensities. Texture
of the texture; however, this feature is more classification categories the exotic texture
useful for synthesis than analysis tasks [7]. sample into known texture categories based on
distinguishing feature. Texture analysis can be
C. Model Based Model
done in terms of four issues, namely texture
Model based model basically imagine the classification, segmentation, synthesis, shape
underlying texture process, constructing a from texture. In this paper, we confined our
parametric generative model. In this regard the discussion on classification.
parameters of the model are firstly estimated Classification of texture mainly categorized
and then used for image analysis. These types into two steps, like learning step and the
of analysis can achieve using fractal and recognition step. In case of the learning step, we
stochastic models.[11-16]. The fractal model can built a model based on training data and
also to helpful for modeling several natural training class present in texture image, training
textures. It can be used also for texture analysis data comprises known texture feature and
and discrimination [12, 17-19]. However, it is not known properties of texture like spatial structure,
suitable for describing local image structures. roughness, energy, contract, histogram and
D. Geometrical Approach many more. In case of the recognition step, we
extract the unknown feature of texture content
Geometrical model consider the texture as
by means of some texture analysis method, then
comprises of texture primitives. This method
we compare these unknown feature with training
typically based on the geometric properties of
images b means of classification algorithm and
texture primitives. The texture primitives may be
the extracted feature is nominate to a group for
extracted either by using edge detection with a
best match.
laplacian of Gaussian or difference of Gaussian
There are mainly two types of texture
filters, once the texture elements are identified in
classification such as supervised and
the image, then the analysis is made either by
unsupervised classification. In supervised
computing stastical property of the texture
classification method, classifier trained with the

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
61
features of known classes. In unsupervised technique texture is described by a collection of
classification method, classifier recognizes statistic of selected feature, statistical method
different c lasses based on input feature are further classified based on number of pixel
similarity, no prior classifier training happens. are used to identify local feature, these are first
Texture classification methods can be classified order (single intensity value), second order
into three categories: Pixel based, Local Feature (double intensity value) and higher order (three
based and Region Based [34]. or more intensity value) statistic [10]. The first
order statistic are basically used to estimate
mean and variance of single intensity value and
eliminating pixel neighborhood relationship,
where as second and higher order statistic
Fig. 1 Phases of Texture Classification. calculate the properties between two or more
intensity value happening at precise location.
Texture classification mainly consists of three Second and higher order statistic are most
steps such as image acquisition, feature popular because first order statistic doesnt
extraction and classification. In the first phase of provide accurate feature because its based only
texture classification, texture image is acquired gray level histogram, therefore it does not
and pre-processed. Various pre-processing provide information about the relative position of
techniques are applied, like image intensity of each other. These methods generally
enhancement, noise removal, color space estimate the different properties based on the
transformation. Distinctive features are three major issue namely first order, second
extracted in next phase, Feature extraction order and spectral approach. A Statistical
methods fall into the categories of either spatial approach includes Fourier transforms, fractal,
or spectral domain. Decision about which co-occurrence matrix, and spatial auto
category the texture belongs to is taken in correlation etc.
texture classification phase. This is done based First order statistical Texture Analysis- These
on classification algorithms like support vector types of texture analysis can be done by use of
machine (SVM), nearest neighbor built on image histogram or probability of pixel
extracted features. If the classes have not been occurrence. In this case two standard statistical
defined then the classification is treated as parameter of image analysis like mean and
unsupervised classification. While in other case, variance are used to characterize the texture
if the classes have already been defined for analysis. First order statistical approach can be
training textures, then the classification treated achieved as follows:
as supervised classification p(i) N (i) / M
Based on intrinsic characteristics and (1)
applications feature extraction algorithms were Where N (i) the total amount of distinct gray
used to trace important feature regions on
level present and M is the entire amount of
texture images. The region can be represented
pixel in an image.
either global or local neighborhood and
Second order statistical Texture Analysis- The
distinguished by different parameter like shapes,
second order statistical analysis generally
size, intensities, and statistical property many
estimates the properties of two pixel value
more. Local feature extraction technique are
happening at precise region comparative to
classified into two broad category namely
each other. It was presented by haralick et al. [6]
intensity based feature extraction and structured
by implementing pixel co-occurrence matrices in
based feature extraction. In case of Intensity
terms of gray tone spatial dependence matrices
base method, it consider local intensity pattern to
(GTSDM). The GTSDM is the possibility of
locate regions. Structured based model identify
estimating pixel gray level at particular distance
structures of image like lines, edges, circle,
d with an angle , the co-occurrence matrices
corners, ellipse and many more.
are calculated in four direction of angle such that
A wide variety of techniques has been
0, 45, 90, 135 degree. In this case haralick et al.
proposed for feature extraction such as
[6] estimate a set of 14 feature for texture
statistical method, geometrical method,
analysis such as energy, entropy, correlation,
model-based method, signal processing method
contract, variance, angular second moment,
and pixel based model.
inverse difference moment, sum average, sum
A. Statistical Method variance, sum entropy, difference variance,
Statistical model generally investigate the spatial difference entropy, information measure of
arrangement of pixel value by calculating local coorelatio1, information measure of correlation,
feature at every intensity position in the image max correlation coefficient.
plane and finding a set of statistic from the Higher order statistical texture analysis-
distribution of the local feature [10]. In this Higher orders statistical estimate the properties

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
62
of two or more pixel value happening at precise This model was introduced to measure local
location relative to each other. The gray level run image contract. It was presented by Ojala et al
length methods are based upon higher order [36] for classification of rotation invariant texture.
statistical analysis proposed by Galloway [43].
This method can contain the information
regarding particular gray level in a certain
direction.
B. Geometrical Method
Geometrical analysis dealt with the texture in
Fig. 2 Graphical Presentation of local binary
terms of texture primitives and to represent the
pattern.
texture primitives as well as spatial arrangement
of pixel. These primitives are evaluated by
In this case, each pixel is compared with its 8
means of edge detection in terms of laplacian of
neighbours. If the center pixel's value is greater
gaussian or difference of gaussian filter. After
than the neighbor's value, bit is set to 1 in LBP
computing either statistics primitives such as
feature vector.
area, edge, orientation etc or by placement rule p 1 S(g
LBP gc )
of the elements, Image edges are often used as p 0 p
primitives described in [2]. Haralick et al. [1] (3)
presented co-occurrence matrices, which Where S (x)
=1 if x is greater than zero otherwise
described second order statistics of edges. one. Gc represents the gray level of an arbitrary
C. Model Based Method pixel P and gp represents the gray value of a
Model-based methods hypothesize the sampling point in an evenly spaced circular
underlying texture process, constructing a neighborhood of P sampling points.
parametric generative model, which could have E. Local feature based methods
created the observed intensity distribution. The Local image features such as edges, blobs are
intensity function is considered to be a used to describe the texture in local feature
combination of a function representing the based models. When texture elements are
known structural information on the image relatively large consisting of several pixels,
surface and an additive random noise sequence. texture classification is achieved with local
D. Pixel Based Method feature based model.
Pixel-based methods analyze an image in terms Edge based
of group of pixels. In this model texture is Marr proposed extracting first-order statistical
represented by means of arrangement of gray features from the edge map [37], while Zucke r
levels in the image. The gray level et a l. [38] suggest that histograms of image
co-occurrence matrix are mostly used technique features like edge magnitude, or orientation are
in pixel based described in [2]. In pixel based used to discriminate among dissimilar textures.
methods, statistical properties obtained from Khouzani. et al [39] used radon transform to
identify the principal orientation of image. The
intensities or gray levels of pixels of image.
projection function g(,s) can be rewritten as
g( , S ) f ( x, y) ( x sin y cos s)dxdy
Gray Level Co-occurance Matrix
Gray level co-occurrence matrix was presented (4)
by haralick et al. [2] for image classification. In g(,s) is the line integral of the image intensity of
this approach a set of matrices are created and f(x,y), along a line with distance s with at angle in
shows the probability that a pair of brightness the x-axis. The gathering of these g(,s) is called the
value (i,j) will occur at a certain separation of radon transform of image. The Radon transform has
each other by at angle of 0,45,90, and 135 typically larger variations. Therefore, the variance of
degree. Given an image I, of size MM, the the projection at this direction is locally maximum and
co-occurrence, matrix P can be defined as the texture is rotated to place its principal direction at
M M 1ifI( x, y) iandI( x x, y y) j0 degrees. A wave let transform is applied to the
P(i, j ) x 0 y 0 0otherwise rotated image to extract texture features. This helped
to achieve rotation invariance in texture detection.
(2) Scale Invariant Feature Transform (SIFT)
Where, the offset (x, y), is specifying the This method was presented by David Lowe [40]. SIFT
distance between the pixel of interest and its descriptors are invariant to scale, translation and
neighbor. Davis et al [35] proposed the rotation transformations. Feature points are
extraction of features like contrast, uniformity considering by maxima and minima of the result of
variation of Gaussians function applied in scale
and detecting clusters from gray level
space to a series of smoothed and resample images.
concurrence matrix for feature classification. The gradient magnitude and orientation at each point
Local Binary Pattern in the region used as descriptor in SIFT.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
63
F. Region Based Model [2] R. M. Haralick, Statistical and structural
This model represents an image as a set of sub approaches to texture, in Proc of IEEE, vol. 67,
patterns and sited by means of certain rules. The pp. 786-804, 1979.
random mosaic model is an example of this model to [3] L. S. Davis, Polarogram: a new tool for image
represent the image into regions and allocate gray texture analysis, Pattern Recognition, vol. 13,
levels to the regions based upon probability density No. 3, pp. 219223, 1981.
function. It was try to find partitions of the image [4] A. Rosenfeld , J. Weszka, Picture Recognition
pixels into sets corresponding to coherent image in Digital Pattern Recognition, K. Fu (Ed.),
properties such as brightness, color and texture. Springer-Verlag, pp. 135-166, 1980.
Spatial variability within in the image can provide [5] M. Levine, Vision in Man and Machine,
useful information for region based texture McGraw-Hill, 1985.
classification. [6] R. M. Haralick , K. Shanmugam, Textural
Block Counting Method Features for Image Classification, IEEE
Alrawi et a l [42] proposed block based method for Transactions on Systems, Man, and
texture analysis. The algorithm based on blocking Cybernetics, Vol. 3, No. 6, pp. 610-621, 1973.
procedure where each image divide into block (1616 [7] A. Materka, M. Strzelecki, Texture analysis
pixels) and extract vector feature for each block to methods A review, Technical Report,
classification these block based on these feature. University of Lodz, Cost B11 Report ,1998.
[8] R. M. Haralick, L. G. Shapiro, Computer and
An Active Patch Model
Robot Vision, Addison-Wesley Publishing
Mao et. al [43] used bag of words model where the
Company, 1993.
features are based on a dictionary of active patches.
[9] R. Chellappa, R. L. Kashyap, B.S. Manjunath,
Active patches are raw intensity patches which can
Model based texture segmentation and
feel spatial transformations like rotation and scaling,
classification, in The Handbook of Pattern
and regulate accordingly for best match the image
Recognition and Computer Vision, C.H. Chen,
regions. The dictionary of active patches is necessary
L.F. Pau and P.S.P Wang (Editors) World
to be compact and representative, in the sense that it
Scientific Publishing, 1998.
can be used to approximately reconstruct the images
[10] T. Ojala, M. Pietikinen, Texture
that needs to be classified. Images are classified
Classification,Machine Vision and Media
based on the occurrence frequency of the active
Processing Unit, University of Oulu, Finland.
patches.
Available at
While choosing a texture analysis algorithm, a
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOC
number of aspects should be considered
AL_COPIES/OJALA1/texclas.htm.January,
Illumination (gray scale) invariance
2004.
Spatial scale invariance [11] G. Cross, A. Jain, Markov Random Field
Rotation invariance Texture Models,, IEEE Trans. Pattern Analysis
Projection invariance and Machine Intelligence,vol. 5, No.1, pp. 25-39,
Robustness with respect to noise 1983.
Robustness with respect to Parameters [12] A. Pentland, Fractal-Based Description of
Computational complexity Natural Scenes, IEEE Trans. Pattern Analysis
Generativity and Machine Intelligence, vol. 6, No. 6, pp.
Window/sample size 661-674, 1984.
[13] R. Chellappa, S. Chatterjee, Classification of
IV. CONCLUSION Textures Using Gaussian Markov Random
Fields, IEEE Trans. Acoustic, Speech and
In this paper, we are discussing various texture Signal Processing, vol. 33, No. 4, pp. 959-963,
analysis techniques. However, textures in the real 1985.
world are often not uniform, due to changes in [14] H. Derin, H. Elliot, Modeling and Segmentation
orientation, scale or other visual appearance. Form of Noisy and Textured Images Using Gibbs
this analysis; we conclude that there are various Random Fields, IEEE Trans. Pattern Analysis
methods are available for texture classification. and Machine Intelligence, vol. 9, No. 1, pp.
Hence prior to starting the texture classification 39-55, 1987.
phase, it is essential to previously choose the [15] B. Manjunath, R. Chellappa, Unsupervised
practical parameters and features to decrease the Texture Segmentation Using Markov Random
volume of data and to optimize the intolerance power Fields, IEEE Trans. Pattern Analysis and
of these techniques. Further semantic analysis Machine Intelligence, vol. 13, No. 5, pp. 478-482,
needed on more kinds of texture images for choosing 1991.
better classification algorithm on specific texture [16] M. Strzelecki, A. Materka, Markov Random
images. Fields as Models of Textured Biomedical
Images, Proc. 20th National Conf. Circuit
REFERENCES Theory and Electronic Networks KTOiUE 97,
Koobrzeg, Poland, 493-498, 1997.
[1] M. Tuceryan, A. K. Jain, Texture analysis, in The [17] B. Chaudhuri, N. Sarkar, Texture Segmentation
Handbook of Pattern Recognition and Computer Using Fractal Dimension, IEEE Trans. Pattern
Vision, C.H. Chen, L.F. Pau and P.S.P Wang Analysis and Machine Intelligence, vol. 17, No. 1,
(eds.) World Scientific Publishing (1998). pp.72-77, 1995.
[18] L. Kaplan, C. C. Kuo, Texture Roughness
Analysis and Synthesis via Extended

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
64
Self-Similar (ESS) Model, IEEE Trans. Pattern [34] S. D. Larry, Image Texture Analysis Techniques
Analysis and Machine Intelligence, vol. 17, No. A Survey, Springer conference on Digital
11, pp. 1043-1056, 1995. Image Processing,1981.
[19] P. Cichy, A. Materka, J. Tuliszkiewicz, [35] L. Davis, S. Johns, J. Aggarwal, Texture
Computerised Analysis of X-ray Images For analysis using generalized co-occurrence
Early Detection of Osteoporotic Changes in the matrices, IEEE Transactions Pattern Analysis
Bone, Proc. Conf. Information Technology in and Machine Intelligence, vol. 1, no. 3, pp.
Medicine TIM 97, Jaszowiec, Poland, pp. 53-61, 251-259, 1979.
1997. [36] T. Ojala, M. Pietikainen, T. T. Maenpaa,
[20] N. Ahuja, Dot Pattern Processing Using Voronoi Multiresolution grayscale and rotation invariant
Neighborhoods, IEEE Transactions on Pattern texture classification with Local Binary Pattern,
Analysis and Machine Intelligence, vol. 4, pp. IEEE Trans. on Pattern Analysis and Machine
336-343, 1982. Intelligence, vol. 24, no. 7, pp. 971-987, 2002.
[21] M. Unser, M. Eden, Nonlinear Operators for [37] D. Marr, " Early processing of visual informat
Improving Texture Segmentation Based on ion," Phil. Trans. Roy. Soc. B, to be published.
Features Extracted by Spatial Filtering, IEEE [38] S. Zucker, A. Rosenfeld, L. Davis, "Picture
Trans. System Man Cybernetics, vol. 20, No. 4, segmentation by texture discrimination," IEEE
pp. 804-815, 1990. Trans. Comput., vol. 24, pp. 1228-1233, 1975.
[22] A.C. Bovik, M. Clark, W.S. Geisler, Multichannel [39] K. Khouzani, H. Zaden, Radon transform
texture analysis using localized spatial filters, orientation estimation for rotation invariant
IEEE Transactions on Pattern Analysis and texture analysis, IEEE Transactions Pattern
Machine Intelligence, vol. 12, No. (1), pp. 55-73, Analysis and Machine Intelligence, vol. 27, no. 6,
1990. pp. 1004-1008, 2005.
[23] A. C. Bovik, Analysis of multichannel narrow [40] D. Lowe, Distinctive image features from
band filters for image texture segmentation, scale-invariant keypoints, IJCV, vol. 60, no. 2,
IEEE Transactions on Signal Processing, vol. 39, pp. 91110, 2004.
pp. 2025-2043, 1991. [41] A. T. Alrawi, A. Sagheer, D. A. Ibrahim, Texture
[24] J. M. Coggins, A. K. Jain, A spatial filtering Segmentation Based on Multifractal Dimension,
approach to texture analysis, Pattern International Journal on Soft Computing, Vol.3,
Recognition Letters, vol. 3, pp. 195-203, 1985. No.1, 2012.
[25] A. K. Jain, F. Farrokhnia, Unsupervised texture [42] J. Mao, J. Zhu, A. L. Yuille, An Active Patch
segmentation using Gabor filtering, Pattern Model for Real World Texture and Appearance
Recognition, vol. 33, pp. 1167-1186, 1991. Classification, Springer conference on
[26] T. Randen, J. H. Husoy, Filtering for texture Computer Vision, pp. 140-155, 2014.
classification: A comparative study, IEEE [43] M. MGalloway, "Texture analysis using gray level
Transactions on Pattern Analysis and Machine run lengths." Computer graphics and image
Intelligence, vol. 21, No. 4, pp. 291-310 1999. processing, Vol. 4, No. 2, pp. 172-179, 1975.
[27] T. Randen, J. H. Husoy, Texture segmentation
using filters with optimized energy separation,
IEEE Transactions on Image Processing, vol. 8,
No. 4, pp. 571-582, 1999.
[28] J. Daugman, Uncertainty Relation for
Resolution in Space, Spatial Frequency
andOrientation Optimised by Two-Dimensional
Visual Cortical Filters, Journal of the Optical
Society of America, vol. 2, pp. 1160-1169, 1985.
[29] S. Mallat, Multifrequency Channel
Decomposition of Images and Wavelet
Models,IEEE Trans. Acoustic, Speech and
Signal Processing, vol. 37, No. 12, pp.
2091-2110, 1989.
[30] A. Laine, J. Fan, Texture Classification by
Wavelet Packet Signatures, IEEE Trans.
Pattern Analysis and Machine Intelligence, vol.
15, No. 11, pp. 1186-1191, 1993.
[31] C. Lu, P. Chung, C. Chen, Unsupervised
Texture Segmentation via Wavelet Transform,
Pattern Recognition, vol. 30, No. 5, pp. 729-742,
1997.
[32] M. Brady and Z. Y. Xie, Feature Selection for
Texture Segmentation, in Advances in Image
Understanding, K. Bowyer and N. Ahuja (Eds.),
IEEE Computer Society Press, 1996.
[33] W. K. Lam, C, K. Li, Rotated Texture
Classification by Improved Iterative
Morphological Decomposition, IEE Proc. Visual
Image Signal Processing, vol. 144, No. 3, pp.
171-179, 1997.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
65
A Novel Approach to Detect and Prevent Wormhole attack in
Wireless Networks
[1]
Sara Ali, [2] Dr Krishna Mohan
[1]
PhD Research scholar Dept of CSE Mewar University Gangrar, Chittorgarh, India ,
[2]
Principal, Siddhartha Institute of Engineering & Technology, Puttur, India

Abstract: Wireless network have gained immense popularity in the last decade as they provide
features like scalability, flexibility, cost effectiveness etc. A major challenge observed with the advent
of new technology in wireless network is that of security. As the network is wireless in nature it is
exposing the different layers to various security threats and attacks.
Our research from various papers finds wormhole attack to be the most dangerous and severe
attack taking place at the routing protocols. In this attack one or more than one malicious nodes
capture the packet at a certain location and re-transmit it to a remote locate .This attack is
considered severe as they do not need to compromise any node as they can effectively use a laptop
or any other wireless device to send the packets. Through this paper we have conducted a detailed
survey on the wormhole attack, its types and classification. We have also analyzed the existing
detection techniques and proposed an algorithm for detection of the attack.

Index Terms: Traffic Analysis, VPN, Wireless Network, Wormhole Attack

I. INTRODUCTION 1
With an increase in the Utilization of wireless
network a problem of security [1] is being ATTACK CLASSIFICATION IN WIRELESS NETWORKS
encountered by various implementers. The The attacks in a wireless network can be
network is wireless in nature as there is no classified into two types [5]
definite infrastructure [2,3] which exists for 1) Passive Attack
communication between network nodes. There
2) Active Attack.
is no requirement for a central access point.
These attacks are classified as mentioned in
There are various factors which have influenced Classification
the popularity of the wireless network, some of
them are

Convenience Figure 1
Deployment Wireless
Network
Mobility Attacks

Productivity
Passive Active
Cost Effectiveness Attacks Attacks

Flexibility of Location
Cost Eavesdroppi
ng
Routing
Information
Flood
Network
Routing
Procedure
Hiding

The network has resulted in an increase in the Malicious Route


Wormhole
route broken False reply
productivity as the accessibility to the network request message
Attack

resources increases [4].The process of


configuration and reconfiguration is simple, cost
effective as fast. Figure 1: Wireless Attack

The main factors which have played an A. Passive Attack


important role in the growth of the wireless In these types of an attack the node
network are convenience, cost effectiveness continuously monitors the network and gains
and ease of integration. These days almost all information to the sensitive information without
computers are coming equipped with the being discovered. It monitors the target node till
technology necessary for the wireless network.
it has gained enough information to launch an
active attack.
They are of two types

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

66
Eavesdropping and Traffic analysis which may lead to Denial of service attack

B. Active Attack

After gaining enough information about the


network using passive attack the malicious
nodes can now launch an active attack. This
attack can be established by using a large
number of nodes
They are of two types
Routing and Flooding the Network
Our research has leaded us to the conclusion
that wormhole attack is the most severe of all.

I. WORMHOLE ATTACK

The wormhole attack is the most dangerous


attack in the network. Two or more collaborating Figure 3: Route Request from Source Node of
malicious nodes can launch this attack by Destination in presence of wormhole Tunnel
creating a low latency tunnel and re transmitting
the captured packet to different parts of the
network. The architecture of the network is such
that these malicious nodes can capture packets II. W ORMHOLE ATTACK DEPLOYMENT
which are not addressed to these nodes and re
transmits the same to the other malicious
partner at the other side of the tunnel, this III. THE WORMHOLE ATTACK CAN BE DEPLOYED IN
THE FOLLOWING MODES.
creates an illusion that these nodes are
physically very close to each other. 1) Wormhole using Encapsulation
2) Wormhole using Out-of Band Channel
3) Wormhole using Packet Relay
4) Wormhole using High Power
Transmission
A. Wormhole Using Encapsulation:
In this attack one of the malicious nodes at one
end of the network hears an RREQ packet and
transmits it to the second colluding party at the
distant location which is near the destination
[6].The colluding second party on hearing the
Figure 2: Wormhole Attack Re- broadcasted RREQ packet broadcast this
packet.
This attacks leads to the disruption in the routing
as the nodes get an impression that the link The neighbors of the second party will drop any
consists of one or two hops as compared to further legitimate communication request which
multiple hops. These attacks are thus very will arrive on the legitimate path. This has
dangerous and even difficult to detect as the resulted in forming a wormhole network through
wormhole tunnels are private in nature and out which the source and destination will
of bound and hence won't be visible to the communicate. These malicious nodes will
network [5]. prevent the nodes from discovering the
legitimate nodes. Let us consider a scenario in
The Wormhole and black hole attacks create an which node A tries to send a packet to B by
illusion of providing the shortest path and results finding the shortest path in presence of two
in the entire traffic getting diverted to this route malicious nodes X and Y. When X receives a

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

67
packet it routes it to Y through the existing path Figure 5: Wormhole Using Out of band channel
(U-V-W-Z), on receiving the packet Y
de-marshals it and rebroadcasts it again. If we
C. Packet Relay
notice closely the hop count hasnt increased
In this type of attack the malicious node
due to encapsulation. When the RREQ travels
transmits the packets between two nodes which
from A to B through C-D-E node B has two
are located at a distant location and convinces
routes to choose from the first being
them to be neighbors. This attack is dangerous
(A-C-D-E-B) which is 4 hops long and the
as it can be launched even with one node. When
second route (A-X-Y-B) gives an impression of
large nodes are malicious the neighboring list
only 3 hops. Node B will select the smaller route
can be expanded and can be extended to
which in reality has 7 hops. Any network using
several hops.
shortest path is vulnerable to these kinds of
attacks.

Figure 6: Wormhole Using Packet Relay

D. Wormhole with High Power Transmission


Figure 4: Wormhole Using Encapsulation
In this attack when one malicious node
B. Out of band channel receives s a RREQ, it broadcasts the RREQ at
This type of attack can be achieved by using a a very high power level; this capability is not
direct wired link or long-range directional bestowed to any other node. When the node
wireless link. It is more difficult to establish as it listens to the broadcast it re-broadcasts toward
needs a special hardware. When 2 malicious the destination.
nodes X and Y are present in the network having
a channel which is out-of-band between them,
when the node X sends a RREQ to Y which is a
neighbor of B, when Y broadcast its packet B
receives 2 RREQ A-C-D-E-F-B and A-X-Y-B.
The first is rejected as it seems longer and the
second is selected.

Figure 7: Wormhole Using High Power


Transmission

IV. CLASSIFICATION OF W ORMHOLE ATTACK


The wormhole attack can be classified into the
following types

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

68
process
1) Open Wormhole attack
2) Closed Wormhole Attack
3) Half open wormhole attack.

A. Open wormhole
In this type of attack the attacker nodes include
themselves in header of the RREQ packet
followed by the route discovery procedure.
These nodes are not hidden in the network but
the other nodes will not be aware of the
malicious nature of these nodes thinking them to Figure 10: Half Open Wormhole Attack
be their direct neighbors.

V. DETECTION OF W ORMHOLE ATTACK

The author in [7,8] considers the following


parameters to detect the wormhole attack

1) Decrease in the length of the path

2) An increase in the end-to-end delay derived


from calculating the sum of hop delays despite
C. Figure 8: Open Wormhole Attack
short path advertisement.

3) Nodes which do not follow the paths


B. Closed wormhole advertised may incur delay caused due to some
In this type of an attack the malicious nodes will malicious nodes which may be involved in the
not modify the packet content; they just tunnel attack leading to an increase in the delay in
the packet from one end of the wormhole to end-to-end routing caused by hop delay.
another and broadcast the packet again
The various metrics which can be used to detect
the wormhole attack and its strength [9, 10] are
mentioned below.

A. Length
E. The case in which difference in between
the advertised path and actual path is high
then more number of anomalies can be
observed in our network.
B. Robustness
The ability of the wormhole to exist and not
affecting t its strength despite a certain amount
of network topology changes have taken place
D. Figure 9: Closed Wormhole Attack
C.Strength
The total amount of traffic that can be attracted
C.Half open attack by an incorrect link advertisement made by the
This attack the malicious node at one side of the malicious nodes.
wormhole does not modify the packet while the
D.Attraction
node at the other end of the wormhole modifies The metric which displays a decrease in the
the packet followed by the route discovery length of the routing path offered by the

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

69
malicious wormhole tunnel when small
improvements in the correct path results in a
decrease in its strength VPN

A Virtual Private Network is a technology used


VI. A BRIEF SUMMARY OF W ORMHOLE DETECTION
TECHNIQUES to secure the network which creates an
encrypted network over a less secure network,
Disadvantages Advantages Techniques when the underlying network fails to do so.

Restrict the Both the techniques Distance andObserver Nodes


transmission are employed where location based
Network Nodes which are concerned with
distance of strict clock approach to
monitoring the network performance and
packet and need synchronization and detect
detecting any security breaches
the nodes to be global positioning wormhole
tightly system coordinate all geographicalAssumptions
synchronized nodes and temporal
1) VPN build on top of it which maintains a
Each nodes Requires less Directional record of all nodes present in the
needs to be synchronization. Antenna network and maintains a malicious list.
The system contains observer nodes
equipped with a
which constantly monitor the network at
special hardware random interval of time.
and may result in 2) VPN maintains a record of all the
directional errors malicious and threshold reaching
nodes. It also maintains the status of
Not always Guard node or LITEWORP malicious threshold flag.
possible to find Observer nodes are 3) All Nodes need to get authenticated by
guard node for used to detect the the VPN to enter the network
4) VPN assigns a unique identifier to the
particular link. wormhole if one of its
node and during the registration phase
neighbor is behaving checks if the node was previously
maliciously 5) Detected as malicious node and set
malicious threshold flag to zero.
Guard node uses Approach Uses 6) Once the node enters the network the
local broadcast encryption techniques information is shared with the observer
keys which are Graph nodes.
available only in Theoretical 7) The observer nodes constantly monitor
the network at random time t.
one hop 8) Once the node is detected as malicious
neighbors. it is reported to the VPN which assigns a
malicious threshold flag
1. Guard nodes are Cluster based 9) This gets incremented whenever the
used to inform cluster detection observer nodes report the node to be
heads about the techniques. malicious.
attack. 10) When malicious threshold flag is
greater than or equal to 1 it is removed
2. No special from the network and the node
11) With its unique identifier number and IP
hardwires are used.
address gets added to the malicious
node list
VII. W ORMHOLE DETECTION ALGORITHM

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

70
Figure 11: Wormhole Detection Algorithm

II. VII CONCLUSION


The Algorithm forms an integral part in this
paper .It helps the system from harmful attacks
and detects the malicious nodes and deletes
them from the system when discovered.

Our paper also gives a solution to the traffic


problem by keeping a threshold factor in
consideration. The VPN helping in improving the
authenticity of the system and

makes the node passes an entry test before


entering into the system. On the whole our
paper helps in prevention and detection of the
wormhole attacks.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5

71
REFERENCES

[1]International Journal of Advanced Research in


Computer Science Research Paper Enhanced
Security Framework for Wireless Networks Sara Ali
DR S Krishna Mohan
[2]Ijesrt International Journal Of Engineering
Sciences & Research Technology Literature Survey
On Wormhole Attack Avinash S. Bundela Computer
Science & Engineering Medicaps Institute of
Technology and Management, Indore (M. P.), India
[3]ijrdet Survey of Wireless Sensor Network
Vulnerabilities and its Solution Poonam Khare 1, Sara
Ali 2

[4] Choi, Min-kyu, et al. "Wireless network security:


Vulnerabilities, threats and countermeasures."
International

[5] An Overview on Wormhole Attack in Wireless


Sensor Network: Challenges, Impacts, and Detection
Approach Saurabh Ughade, R.K. Kapoor and Ankur
Pandey

[6]Marianne Azer,Sherif El-Kassas,Magdy


El-Soudani. A Full Image of the Wormhole Attacks
Towards Introducing Complex Wormhole Attacks
International Journal of Computer Science and
Information Security 1.1 (2009)journal of Multimedia
and Ubiquitous Engineering 3.3 (2008).

[7] Hu, Yih-Chun, Adrian Perrig, and David B.


Johnson. "Packet leashes: a defense against
wormhole attacks in wireless networks." INFOCOM
2003. Twenty-Second Annual Joint Conference of the
IEEE Computer and Communications. IEEE
Societies. Vol. 3. IEEE, 2003

[8] Jayarekha, P., Sunil Kalaburgi, and M. Dakshayini.


"SECURITY AND COLLABORATIVE
ENFORCEMENT OF FIREWALL POLICIES IN
VPNS.

[9] V. Mahajan, M. Natu, A. Sethi. Analysis of


wormhole intrusion attacks in MANETS. In IEEE
Military Communications Conference(MILCOM), pp.
1-7,2008.

[10] Y. C. Hu, A. Perrig, and D. Johnson, Packet


leashes: a defense against wormhole attacks in
wireless networks, in INFOCOM , 2003.

72
A Preliminary Performance Evaluation of Machine Learning
Algorithms for Software Effort Estimation
[1] [2]
Poonam Rijwani, Sonal Jain
[1] [2]
Research Scholar, Associate Professor

Abstract: Accurate Software Effort Estimation is vital to the areas of Software Project Management.
It is a process to predict the Effort in terms of cost and time, required to develop a software product.
Traditionally, researchers have used the off the shelf empirical models like COCOMO or developed
various methods using statistical approaches like regression and analogy based methods but these
methods exhibit a number of shortfalls. To predict the effort at early stages is really difficult as very
less information is available. To improve the effort estimation accuracy, an alternative is to use
machine learning (ML) techniques and many researchers have proposed plethora of such machine
learning based models. This paper aims to systematically analyze various machine learning models
considering the traits like type of machine learning method used, estimation accuracy gained with
that method, dataset used and its comparison with empirical model. Although researchers have
started exploring Machine learning from past two decades, this paper analyses comparison on
studies being used in recent years. Subsequently exploring various studies, we found that the
estimation accuracy of these ML models is near to the satisfactory level and gives enhanced results
than that of non-Machine Learning based models.

Index Terms: Estimation, Machine Learning, Neural network, Software Effort Model, Systematic
review
analogy based methods are growing. Myriad of
software effort estimation techniques exists from
expert judgment to analogy, empirical models to
INTRODUCTION statistical techniques.
Software development Effort Estimation is the Instead of using expert judgment to decide the
process of estimating the cost and time to minimum and maximum range of effort, software
develop the software. Collectively we call it specialists better focus to use historical data
Software Effort Estimation. Estimations done at about former estimation error to set realistic
minimum maximum effort intervals [3].
early stages of software development play a vital
Though expert judgment can be very precise, it
role in effective software project management.
can also be simply misled. If the experts, who are
There are numerous algorithmic and non- involved with estimating the effort, are made
algorithmic models exists to estimate the aware of the budget, expectations of the clients,
software effort but still the research advocates
that on an average, the overrun of the software
projects appears to be nearly 30 percent. [1]

time availability or other parameters that govern


A detailed review was ushered by Jorgensen and
the estimation, the estimations can be misled.
Shepperd [2] which ascertains nearly 10
One chronicle way to improve the precision of
estimation approaches for software effort
effort estimates is using historical data and
estimation. Amongst those methods, the
estimation checklists consisting of various
dominating ones were regression based methods
estimation parameters. When relevant past data
and also the usage of expert judgment and
and parameter checklist are included in the

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
73
process, actions are less probable to be prevailing technique for software effort estimation
overlooked, and its more likely to produce in industry.
realistic estimates. Many software organizations
use tools for this so that to improve software The purpose of this paper is to present a
effort estimations. systematic review of machine learning
techniques mainly artificial neural networks and
Too-low estimates can lead to lower quality of its comparison with existing empirical models.
product developed, possible rework in later One of the most popular empirical models used
phases, and greater risks of project failure; in the industry is COCOMO for estimating the
whereas higher estimates can diminish software effort. Although the research of
productivity in accordance with Parkinsons law, amalgamating machine learning has started from
which states that work expands to fill the time past two decades, our paper mainly focuses on
available for its completion[4]. the latest machine learning procedures being
Several studies corresponding to estimation of proposed and implemented.
effort analyzes and compares the precision of
such models and approaches. The study shows II. Machine Learning
that there Is No Best Effort Estimation Model or
Technique. One of the foremost reasons for this
instability in results is essential correlation Machine learning solely focuses on writing
between various parameters governing the software that can learn from past experience. A
software effort, such as project size, type of computer program is said to learn from
project, development environment etc. [5] .In experience E with respect to some class of task
addition to this, the parameters which have T and performance measure P, if its
prevalent impact on the development effort performance at tasks in T, as measured by P,
seems to fluctuate, signifying that estimation improves with experience E [12].It is an
models should be personalized to the extraction of knowledge from data. Machine
environments in which theyre used. learning can be categorized into three types:
Supervised Learning, unsupervised Learning and
Reinforcement Learning. Supervised learning is
In past few years, machine learning centered where we teach; train the machine using data
methods have been getting growing already available with the correct outcome. The
consideration in software development effort more the dataset, the better the machine will
estimation research. Amongst various popular learn about that subject. After the machine is
estimation models like algorithmic model and trained, it will be given unseen data and based
expert judgment, Machine learning based models on the past experience it will give the outcome.
are also considered as an important category of Unsupervised learning is where the machine is
effort estimation [6-8]. trained using a dataset without labels. The
Zhang and Tsai [9] summate the uses of many learning algorithm is never told what the data
Machine Learning techniques in software represents and it infers a function to define
development domain, including support vector hidden structure from unlabeled data.
machines, case-based reasoning, decision trees, Reinforcement learning is the one in which
artificial neural networks, and genetic algorithms. training data is available but unlike supervised
Though the study on Machine Learning models is one, correct input/output pairs are never
growing in academia, latest investigations [2, 10,
11] have shown that expert judgment which a presented. Once the unlabeled data has been
non-machine learning based model is still the processed it only takes one example of labeled
data to make the learning algorithm fully

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
74
effective. A good example is in playing games. software project management. Here, in this
When a machine wins a game, then the result is section, we present a review of various neural
trickled back along with all the moves to reinforce network models for effort estimation proposed
the validity of those moves. and implemented by many researchers.
We are focusing on the problem of software
effort estimation and our goal is to create a Researcher Venkatachalam [13] presented
machine which can mimic a human mind and to simplified feed-forward neural network (FFNN)
do that it needs learning capabilities. Once a for software development effort estimation.
machine is trained based on the above category Venkatachalam used back propagation neural
of learning, the effort can then be predicted. The network for estimating effort exhausting 22
machine learning particularly neural network independent variables which were COCOMOs
approaches give estimations close to human cost drivers. Evaluation criteria were not
level estimations. specified with his study.

III. METHODOLOGY Researcher Finnie et al. [14] in 1997 presented a


comparison of statistical regression based model
with other artificial Intelligence based estimation
Here in this paper, Constructive Cost Model models for estimation of software development
(COCOMO) is being used for investigation effort. The Researchers found out that statistical
purposes. This regression based method to regression model underperformed for intricate
estimate effort has given by Sir Barry Boehm in and complex software projects while the Artificial
1981 and then to adapt to new software Intelligence based models provide agreeable
development environment, its new revised version estimation results. They considered dataset
COCOMOII was published. Its various parameters among Projects from 17 organization and
are from data of various historical projects. The Desharnais. MMRE was used as an evaluation
procedure of effort estimation is performed in criterion.
following steps:
A. Preprocessing of Data Another researcher in 2002, Heiat [15]
B. Procedure Setup investigated Feed Forward Neural Networks with
C. Selection of Input used function point and Radial Basis Neural Network
D. Experimentation with Source Lines of Codes for diverse datasets
E. Evaluation Criteria encompassing projects of varied generation
F. Testing and Validations languages. For every dataset separately, Heiat
has given evaluation with regression model. He
All the models were implemented using utilized Kemerer dataset of 15 projects and IBM
standard datasets available and trained with DP service organization dataset of 24 projects for
70 percent inputs as training data and rest first investigation, and for second trial, utilized
used for testing and validation purposes. In Hallmark dataset of 28 projects. The IBM and
the papers that are explored, COCOMO 81, Kemerer projects are developed using third
NASA (63), NASA (93), IBMDPS, Kemerer, generation languages while Hallmark projects are
Hallmark and Maxwell datasets are used for developed with fourth generation languages. The
the Software development Effort Estimation. results have shown that artificial neural network
method is modest with regression when a third
IVNeural Network Techniques generation language data set is used. But in case
for Effort Estimation of fourth generation languages data set or mixed
Software development Effort estimation is a dataset were used, neural network methodology
challenging task for people associated with

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
75
works expressively precise for software effort with System Evaluation and Estimation of
estimation. Resource Software Estimation Model
(SEERSEM) in software estimation practices and
Another Researcher in 2006, Ideri, Abran, to apply the architecture that combines the
Mbarkiapplied clustering algorithms with Radial neuro-fuzzy method with diverse algorithmic
Basis Feed Forward Networks. For clustering the model. The results of this research also
training sets, clustering algorithms were used demonstrate that the general neuro-fuzzy
and evidenced that C-means with Radial Basis structure can perform well with many algorithmic
Feed Forward Networks achieves improved models for refining the performance of software
results with APC III algorithm with Radial Basis development effort estimation.
Feed Forward Networks for software effort
estimation. Another researcher used the amalgamation of
In 2007, Tronto, de-silva, Sant made Functional Link Artificial Neural Network (FLANN)
comparisons of conventional linear regression and Particle swarm Optimization (PSO) algorithm
model and simplified Feed Forward Neural for Software Effort Estimation [22]. Hybrid PSO-
Networks for Boehms COCOMO dataset. The FLANN architecture is a type of three-layer Feed
experimentations were showed that the Neural Forward neural network. PSO algorithm is used
Network based method accomplishes enhanced to train the weight of FLANN vector. Calculation
results as that of with linear regression model. has been done on three datasets COCOMO 81,
The reason for improved results is due to NASA63 and Maxwell. Hybrid algorithm
adaptable and non-parametric nature of neural increases the accuracy of the input vector
networks. parameters.

In 2009, Reddy and Raju suggested a multilayer Another hybrid approach by combining
feed-forward neural network to accommodate the Functional Link Artificial Neural Network (FLANN)
Boehms COCOMO and its parameters to and Genetic Algorithms (GA) for effort
estimate effort. Reddy, Raju shared the estimations were proposed by benala, Dehuri in
complete dataset into training and validation set. 2012 [23]. The Genetic Algorithm fitness function
The ratio for division of dataset is kept to be 80 will be selected to minimize the error find out by
%: 20 % respectively of total 63 projects. The evaluation criteria MMRE as shown in equation:
various input parameters of the COCOMO are 1
accommodated with natural logarithmic order in =

feed-forward neural network, which was a decent
try to place together expert knowledge, project Kalichanin-Balich [24] relates linear regression,
data and the traditional algorithmic approach into and Logarithmic regression with Feed Forward
one single framework which is appropriate to Neural Network. According to the test results, it
predict effort. has been witnessed that software estimate is
more precise and genuine using FFNN rather
Wong, Ho, and Capretz [20] in 2008 presented a than regression and logarithmic models. MMRE
blend of neural nets and fuzzy logics to expand is used as an evaluation criterion.
the precision of backfiring size estimations. The Vinay Kumar, et al [25] used wavelet neural
neuro-fuzzy method was used to attune the network (WNN) with four approaches, i.e., WNN-
conversion ratios with the goal of minimizing the morelet, WNN-guassian, TAWNN-guassian, and
margin of error. TAWNN-morelet. A Threshold acceptance
training algorithm is used for wavelet neural
Wei, Danny, Luiz in 2010 [21] are to assess the network, i.e., TAWNN. WNN-Morelet and WNN-
estimate performance of the neuro-fuzzy model

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
76
Guassian over took various techniques. Results
were efficiently improved.

B. Trimula Rao [26] suggested a FLANN for


software effort estimation. It generates effort and
then processed final output layer. Its one
shortcoming is that in this relation between inputs
and outputs is not reasonable.
IV COCOMO II
Jaswinder Kaur, et-al.[27] instigated a back
propagation Artificial Neural Network of 2-2-1 Originally, the COCOMO model is given by
design on NASA dataset comprises of 18 Barry Boehm in 1981 [31].It is implemented after
projects. Input was KDLOC and development
various investigation on 63 software projects
methodology and effort was the output. MMRE
[32]. This empirical model provides effort in
was found to be 11.78 with his applied approach.
terms of cost and schedule for a development of
Iman Attarzadeh[28] proposed a new model to a software project. In late 1990s, Boehm
accommodate COCOMO II. 5 Scale factors and proposed COCOMO II [26] to accommodate the
17 Effort multipliers were used as input. A environmental changes in software industry. The
sigmoid activation function is used to create purpose of COCOMO model is to express effort
network in order to accomplish post architecture with software size and a series of cost and scale
COCOMOII model. Results shown in terms of factors, as given in the equation below:
MMRE, and Pred(0.25) to compare it with
algorihmic COCOMO.
17
1.01+5
=1
Attarzadeh[29] projected a novel software = . () .
development effort estimation model exhausting =1
neural networks. In this, the Initial weights of the
network were set in such a way that it lead to
COCOMOII model. The proposed neural network Where A is a multiplicative constant, and the
model provides improved result as related to set of SF (Scale Factor) and EM (Effort
COCOMO model after appropriate training. Multiplier) parameters which have a strong
impact on calculated effort.
Vachik S. Dave and KamleshDutta[30]
Moreover Size can be calculated by various
suggested a Adjusted MMRE. They used NASA
dataset comprises of 60 projects. Experiments methods like Kilo Source Lines of Code
were conducted with three different assessment (KSLOC), Function Points, Extended Function
methods, i.e., Mean Magnitude Relative Error, Points and adaptation adjustment factors.
Modified Mean Magnitude Relative Error, and
Relative Standard Deviation. Three estimation
modes are used for this purpose, i.e., Regression
Maximum work has focused based on algorithmic cost
analaysis, FFNN, and RBFNN. According to
authors, RBFNN is found to be a superior models such as COCOMO and Function Points. These might
technique for effort estimation, on the basis of undergo from the shortcoming such as the necessity to
RSD and Modified MMRE. adjust the model to each individual measurement
environment coupled with very variable accuracy levels even
after calibration.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
77
Researcher (year) Method deployed Dataset (no. of Evaluation criteria
projects
Vinay Kumar et al MMRE, Pred(0.25), MdMRE
(2008) Wavelet Neural Networks IBMDPS(24),CF
B.Tirimula Rao C-FLANN,P- NASA(60) RMSE
(2009) FLANN,LFLANN
SrimanSrichandan Radial Basis Functional COCOMO81(252), MMRE, Pred(0.25)
(2010) Neural Networks Tukutuku(53)

Jaswinder Back propagation NASA MMRE, RMSSE


Kaur(2010) artificial neural network

Iman Attarzadeh BackPropogation ANN COCOMO(63) MMRE,Pred(0.25)


(2010)

Vachik S. RBFNN, FFNN, cocomonasa_v1(60) MMRE, Modified MMRE, RSD


Dave(2011) Regression Analysis

Iman Attarzadeh ANN-COCOMOII COCOMO-1(63), MMRE, Pred(0.25)


(2012) NASA93(93)
Jagannath Cascade Forward ANN, NASA(60) ) MMRE, RMSE, Means BRE,
Singh(2012 Elman ANN, Feed Pred(0.25)
Forward ANN, Recurrent
ANN
SrimanSrichandan(2 RBFNN COCOMO 81, MMRE, Pred(0.25)
012) Tukutuku

V. Discussions
IV Various Performance
Table 1: Summarized methods of few Evaluation criteria for Effort
Researchers Estimation

Table 2: Significant Performance Evaluation


criteria in effort estimation

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
78
The purpose of Performance evaluation criteria is Evaluation Criteria Explanation
to identify the accurate and truthful =
| | Relative Error
implementation of the effort estimation
algorithms. The most significant evaluation = 100 Magnitude of
measures used in software effort estimation is Relative Error
presented in table 2. Mean
1
= Magnitude of

VII. CONCLUSION =1 Relative Error

= () MdMRE is
The paper presented a number of Software effort Median (MRE). It
Estimation models based on machine learning is measure for
mean MRE error
techniques for the choice of suitable Artificial
Neural Network techniques for calculating crucial
effort for new projects. The techniques Magnitude Error
| |
considered are MLFF, RBFFN, wavelet neural = 100 Relative is the

networks, Cascade Forward ANN, Elman ANN, error relative to
Feed Forward ANN, Recurrent ANNetc. That the estimate.
trained and tested occurrences are considered Mean of all
with these approaches. Purpose of this entire =
1
observations of
thing is evaluating and comparing ANN methods
=1 MER
with Post Architectural COCOMO in prediction Mean of Absolute
accuracy. Studies conducted on Machine =
1
| | Errors
Learning techniques indicate that the estimated
=1
cost of the software with these models has more
rapidity and precision of algorithmic models such
as COCOMO II, which is a widely used empirical (
| |
)
Mean Absolute
model in software industry. Further, effective =

100
Percentage Error
results show that ANN models in the local data =1

are improved responses in comparison with
algorithmic models. The exploitation of machine
learning techniques like genetic algorithms, fuzzy 1

2
Mean Squared
Error
decision trees, case based reasoning, etc can =

( )
also be applied along with these approaches for =1

topology optimization and structural optimization.



Root Mean
1 2 Square Error
= ( )

=1

IX.REFERENCE
[3] Jrgensen, Magne, and Dag I. K. Sjoeberg.
[1] Halkjelsvik, Torleif, and MagneJrgensen. "An effort prediction interval approach
"From origami to software development: A based on the empirical distribution of
review of studies on judgment-based previous estimation accuracy." Information
predictions of performance and Software Technology 45.3 (2003): 123-
time." Psychological bulletin 138.2 (2012): 136.
238. [4] Jorgensen, Magne. "What We Do and Don't
[2] Jorgensen, Magne, and Martin Shepperd. "A Know about Software Development Effort
systematic review of software development Estimation." IEEE software 31.2 (2014).
cost estimation studies." IEEE Transactions [5] Dolado, Jose Javier. "On the problem of the
on software engineering 33.1 (2007): 33-53. software cost function" Information and
Software Technology 43.1 (2001): 61-72.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
79
[6] E. Mendes, I. Watson, C. Triggs, N. Mosley, estimating software development
S. Counsell, A comparative study of cost effort." Information and software
estimation models for web hypermedia Technology 44.15 (2002): 911-922.
applications, Empirical Software [16] Ideri A., Khosgoftaar TM, Abran, A (2002)
Engineering 8 (2) (2003) 163196. Can neural network be easily interpreted in
[7] I.F.B. Tronto, J.D.S. Silva, N.S. Anna, An software cost estimation? World Congress
investigation of artificial neural networks on Computational Intelligence, Honolulu,
based prediction systems in software Hawaii, May 1217, pp 11621167.
project management, Journal of Systems [17] Ideri A, Abran A, Mbarki S (2006) An
and Software 81 (3) (2008) 356367. experiment on the design of radial basis
[8] M.O. Elish, Improved estimation of software function neural networks for software cost
project effort using multiple additive estimation. IEEE, information and
regression trees, Expert Systems with communication technologies, ICTTA, pp
Applications 36 (7) (2009) 1077410778 16121617.
[9] D. Zhang, J.J.P. Tsai, Machine learning and [18] Tronto IFB, de-Silva JDS, SantAnna N
software engineering, Software Quality (2007) Comparison of artificial neural
Journal 11 (2) (2003) 87119. network and regression models in software
[10] K. Molkken-stvold, M. Jrgensen, S.S. effort estimation. In: Proceedings of
Tanilkan, H. Gallis, A.C. Lien, S.E. Hove, A international joint conference on neural
survey on software estimation in the networks, Orlando, Florida.
norwegian industry, in: Proceedings of the [19] Reddy S, Raju KVSVN (2009) A concise
10th International Symposium on Software neural network model for estimating
Metrics, Chicago, Illinois, USA, 2004, pp. software effort. Int J Recent Trends Eng
208219. 1(1):188193
[11] M. Jrgensen, A review of studies on [20] Wong, J., Ho, D., and Capretz, L. F. (2008)
expert estimation of software development Calibrating Functional Point Backfiring
effort, Journal of Systems and Software 70 Conversion Ratios Using Neuro-Fuzzy
(12) (2004) 3760 Technique. International Journal of
[12] Wang, John, ed. Data mining: opportunities Uncertainty, Fuzziness and
and challenges. IGI Global, 2003. KnowledgeBased Systems, Vol. 16, No. 6:
[13] Venkatachalam, A. R. "Software cost 847 862
estimation using artificial neural networks. [21] Du, Wei Lin, Danny Ho, and Luiz Fernando
Neural Networks, 1993. IJCNN'93-Nagoya. Capretz. "Improving software effort
Proceedings of 1993 International Joint estimation using neuro-fuzzy model with
Conference" on.Vol. 1.IEEE", 1993. SEER-SEM." arXiv preprint
[14] Finnie, Gavin R., Gerhard E. Wittig, and arXiv:1507.06917 (2015).
Jean-Marc Desharnais. "A comparison of [22] Benala T.R., Chinnababu K., Mall R.,
software effort estimation techniques: using Dehuri S., A Particle Swarm Optimized
function points with neural networks, case- Functional Link Artificial Neural Network
based reasoning and regression (PSO-FLANN) in Software Cost Estimation,
models." Journal of Systems and Advances in Intelligent Systems and
Software 39.3 (1997): 281-289. Computing Proceedings of the International
[15] Heiat, Abbas. "Comparison of artificial Conference on Frontiers of Intelligent
neural network and regression models for

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
80
Computing, pp. 59-66, Springer-Verlag, [29] I. Attarzadeh, et al., Proposing an
2013. Enhanced Artificial Neural Network
[23] Benala T.R., Dehuri S., Genetic Algorithm Prediction Model to Improve the Accuracy in
for Optimizing Functional Link Artificial Software Effort Estimation, in Fourth
Neural Network Based Software Cost International Conference on,Computational
Estimation, Proceedings of the Intelligence, Communication Systems and
InConINDIA, pp. 75-82, Springer-Verlag, Networks (CICSyN), (2012), pp. 167-172.
2012. [30] V. S. Dave and K. Dutta, Neural network
[24] I. Kalichanin-Balich, Applying a based software effort estimation &
Feedforward Neural Network for Predicting evaluation criterion MMRE, in 2nd
Software Development Effort of Short-Scale International Conference on Computer and
Projects, presented at Eighth ACIS Communication Technology (ICCCT)
International Conference on the Software (2011), pp. 347-351.
Engineering Research, Management and [31] B. W. Boehm, "Software engineering
Applications (SERA), (2010). economics" 1981.
[25] K. Vinay Kumar, et al., Software [32] B. W. Boehm, Software Cost Estimation
development cost estimation using wavelet with COCOMOII, Prentice Hall, (2000).
neural networks, Journal of Systems and [33] S. Srichandan, A new approach of Software
Software, (81) (2008), pp. 1853-1867. Effort Estimation Using Radial Basis
[26] B. T. Rao, et al., A novel neural network Function Neural Networks, International
approach for software cost estimation using Journal on Advanced Computer Theory and
Functional Link Artificial Neural Network Engineering (IJACTE) (1) (2012) pp. 2319-
(FLANN), International Journal of Computer 2526.
Science and Network Security, (9) (2009),
pp. 126-131, 2009
[27] J. Kaur, et al., Neural Network-A Novel
Technique for Software Effort Estimation,
International Journal of Computer Theory
and Engineering, (2) (2010), pp. 1793-8201.
[28] I. Attarzadeh and S. H. Ow, Proposing a
new software cost estimation model based
on artificial neural networks, in 2nd
International Conference on Computer
Engineering and Technology (ICCET),
(2010) pp. V3-487-V3- 491.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
81
Edge Detection Method based on Cellular Automata
[1]
Jyoti Swarup, [2] Dr. Indu S
[1]
Dept. of Computer Science and Engineering, [2] Dept. of Electronics & Communication
Engineering, Delhi Technological University

Abstract: Edge Detection is a vital pre-processing phase in image processing and computer
vision which detects boundaries of foreground and background objects in an image. Discrimination
between significant and spurious edges plays an important role in accuracy of edge detection. This
paper introduces a new approach for edge detection in images based on cellular computing. Existing
edge detection methods are complex to implement and fail to produce satisfactory results in the case
of noisy images. Some methods tend to give spurious edges and some tend to miss true edges in the
image. The purpose of using cellular computing approach is to reduce complexity and processing
time as the method is computationally simple and fast due to parallel processing. The results of
Mendeley Dataset images are compared with results of existing edge detection techniques by
evaluating MSE and PSNR values which indicates promising performance of the proposed algorithm.
Visually, proposed method tend to produce better results which discriminate objects and interpret the
edges more clearly even for cluttered and complex images.

Index Terms: cellular automata, edge detection, linear rules, parallel processing

finding contour and edges for image


I. INTRODUCTION segmentation and other NP-complete problems,
Edge detection is a procedure to detect such as graph coloring or satisfiability, designing
contour of objects by finding the discontinuities a controlled
or change in brightness within an image. Edge
detection is an important step in digital image random number generator with smaller aliasing
processing and computer vision by preserving rate than a linear counter based on shift register
the important structural properties in an image. and XOR gates and pattern generation [12].
There are several edge detection techniques
[1]-[3] and can be broadly grouped into two II. CONCEPT OF CELLULAR AUTOMATA
categories. The gradient based method detects
A. Structure of cellular automata
the edges by computing the maximum and the
minimum in the first derivative of an image. In Cellular Automata (CA) is a finite state
Laplacian method, edges are traced by locating machine having multiple cells. One dimensional
zero crossings in the second derivative of the CA is a linear array of cells and two dimensional
image. There are problems of false edge CA [10] is a grid of cells where each cell is
detection and missing true edges which can influenced by its neighboring cells. There is a
significantly affect the result of object recognition, finite range of possible states of a cell. State of a
pattern recognition and feature extraction cell is updated simultaneously depending upon
processes. previous states of its neighboring cells. Cellular
Cellular Automata finds its wide applications in Automata can be represented as
the area of Image processing and computer
vision [14]. Theory of self-reproducing Automata
(CA) was initiated by J. Von Neumann and Stan
Ulam [4]-[5] in 1950s. Stephen Wolfram
extended the concept of automata by developing
CA Theory [6]-[8]. Digital image is represented
by a 2-D array for a grayscale image and a Figure 1: (a) Von-neumann (b) Moore (c)
collection of three 2-D arrays for color image. Extended Moore neighborhood
Two dimensional Cellular Automata [10] can be
implemented on an image with an ease. Various There are two neighborhood structure in Cellular
possible applications of CA in image processing Automata: Von-neumann and Moore as shown
ranges from edge detection algorithms, in the fig. 1. Von-neumann neighborhood has
translation of images, rotation through an angle, four neighbors surrounding a cell and in Moore
scaling operations like thinning and zooming, neighborhood, there are eight neighbors. The
radius of neighborhood is 1. In extended Moore

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
82
neighborhood, radius is increased to 2 having 24 2-Dimensional Cellular Automata rules (Rule 2,
neighbors and one center cell [15]. 4, 8, 16, 32, 64, 128 and 256) to an image for its
Four boundary conditions in cellular automata translation in all directions. Various rules were
are: applied to obtain various operations on images
like scaling and thinning horizontally as well as
vertically, zooming of symmetric images. Qadir
et al. [20] extended the concept of translation of
Null boundary condition. the image by using twenty five neighborhoods
0 x1 x2 x3 x4 x5 x6 x7 0 instead of nine neighborhoods. This method for
x8 translation was used in gaming applications. In
[21], khan proposed that hybrid CA is the
Fixed boundary condition. possible solution for rotation of images through
0/1 x1 x2 x3 x4 x5 x6 0/1 an arbitrary angle. According to him, 2-D CA
x7 x8 rules are applied to rotate an image by an angle
about x and y axis respectively.
Periodic boundary condition.
Determination of rule set is a crucial step in
x8 x1 x2 x3 x4 x5 x6 x7 x1
CA. Specifying and selecting rules manually is a
x8
slow and laborious process, also it may not scale
well to larger problems. The Fuzzy Cellular
Adiabatic boundary condition.
Automata is employed with fuzzy logic, having
x1 x1 x2 x3 x4 x5 x6 x7 x8
x8 fuzzy states of a cell and fuzzy functions for
transition rules. Fuzzy CA (FCA) is a special
Reflexive boundary condition. class of CA which is employed to design the
x2 x1 x2 x3 x4 x5 x6 x7 x7 pattern classifier [22]. Wang Hong et al. [23]
x8 suggested a novel method for image
segmentation based on fuzzy cellular automata.
In [24], More and Patel used the property of
B. Rule formation Cellular Learning Automata to enhance the
Elementary CA has two states, 0 and 1, for edges detected by fuzzy logic. In [15], Nayak,
every cell. For a combination of three neighbors Sahu and Mohammed compared the
there can be 8(=23) possible combinations i.e. performance of existing edge detection
000,001,,111. There are total of 28 rules, each techniques with their proposed method based on
rule is represented by an 8-bit binary number i.e extended neighborhood CA and null boundary
Rule 0 to Rule 255. For a two state nine conditions.
29
neighborhood CA, there exist 2 possible
IV. PROPOSED ALGORITHM
rules. Among these, 29 rules are linear and can
be determined by fig. 2. Remaining In the proposed algorithm, all input images are
29 grayscale. This method highlights the
2 29 ( 502) are non-linear rules [16],[17]. contribution made to overall appearance of an
image by significant bits. Considering the fact
64 128 256 that each pixel is represented by 8 bits,
32 1 2 Higher-order bits i.e. first four most significant
16 8 4 bits of binary representation of intensity depict
Figure 2: 2-Dimensional CA rule convention maximum image information.
Each cell represents an image pixel with
Cellular automata have several advantages certain intensity or pixel value. According to
over other methods of computation. Simplicity of Moore neighborhood, four linear rules are
implementation makes it appropriate for solving identified which can efficiently result in
complex problem in less computational time identification of boundary of a region. These
complexity. CA is comparatively faster than composite rules are as given below and are
other methods [13]. calculated with some basic rules and XOR
function. These rules are computed as follow:
III. RELATED W ORK Rule 29 = Rule16 Rule8 Rule4 Rule1
Rule 113 = Rule64 Rule32 Rule16
According to literature so far, CA roots itself
Rule1
for more than two decades in image processing
Rule 263 = Rule256 Rule4 Rule2
[19].
Rule1
In [16], Choudhury et al. applied eight basic

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
83
Rule 449 = Rule256 Rule128 Rule64
Rule1
Integration of these rules result in edges
present in the image.
Rule29 || Rule113 || Rule263 || Rule449

The image is first divided into its bit planes


which is called bit plane slicing, then these
transition rules are applied to every binary bit
plane in parallel. The resultant successor matrix
bit planes are merged into gray image followed
by binarization of image done according to Figure 5: Flowchart of proposed method
Otsus threshold technique. Finally,
morphological operations are performed to V. RESULTS AND CONCLUSION
enhance the results by removing noise and In this paper, a comparison of proposed
obtain true edges in the given input image. method is carried out against the commonly
The state of a cell in the next generation is deployed Gradient and Laplacian based Edge
determined by the previous state of its Detection techniques. These techniques suffer
neighboring cells and all cells are updated the problems of inaccurate edge detection,
synchronously resulting in unit time complexity. missing true edges, producing thin or thick lines
Every cell can have two states, 0 or 1. In a 1x3 and extra edges due to noise etc.
neighborhood structure, state of d pixel is
updated by considering previous states of pixel
a, b and c. Fig. 3 illustrates the method to apply The results of Mendeley Dataset [25] as
identified composite rules by taking different set shown in Fig. 6-8, are compared with existing
of 1x3 neighbors to update value of pixel d. edge detection techniques and indicate
These four rules can be represented as four promising performance of the proposed
borders which result in edge detection when algorithm by evaluating MSE and PSNR values
applied by sliding a window of 3x3 pixels over an as given by Table 1. Mean square error (MSE)
image of size mxn. and Peak signal to noise ratio (PSNR) are used
to compare quality of reconstructed image with
a b c a its ground truth image. If an operator gives
d b d resultant image with less PSNR and high MSE
c then operator has high MSE then operator has
(a) Rule (b) Rule high edge detection capability.
449 113 Evaluation of test images show that proposed
method exhibit better performance even for
noisy and cluttered images. Visually, proposed
a
method produced promising results of edge
d b d
detection when compared to canny, sobel and
c a b c
prewitt edge detectors. Canny consist of several
(c) Rule (d) Rule 29
263 spurious edges whereas sobel and prewitt lack
Figure 3: Conceptual representation for identification some of the strong edges.
of four rules.

Table 1: Experimental results for test images

Original Proposed Canny Sobel Prewitt


image method

Image 1 MSE 0.8810 0.9172 0.9741 0.9745


Figure 4: Illustration (a) region in an image (b) PSNR 0.5500 0.3754 0.1138 0.1122
Contour identified by applying these four rules.
Image 2 MSE 0.9156 0.9201 0.9720 0.9718
. PSNR 0.3831 0.3616 0.1232 0.1242
Image 3 MSE 0.8855 0.9258 0.9740 0.9741
PSNR 0.5283 0.3346 0.1146 0.1142
Image 4 MSE 0.8738 0.9046 0.9586 0.9585
PSNR 0.5858 0.4355 0.1837 0.1840

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
84
Image 5 MSE 0.8825 0.9313 0.9671 0.9670
PSNR 0.5427 0.3092 0.1453 0.1459

(d) Canny (e) Sobel (f) Prewitt


Figure 7: Results of test image2

(a) Original (b) Ground (c) Proposed


image truth method

(d) Canny (e) Sobel (f) Prewitt (a) Original (c) Proposed
(b) Ground truth
image method
Figure 6: Results of test image1

(d) Canny (e) Sobel (f) Prewitt


Figure 8: Results of test image3
(a) Original (b) Ground truth (c) Proposed
image method

1
Sobel
Prewitt
0.8 Canny
Proposed method
PSNR value

0.6

0.4

0.2

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Images

Figure 9: Bar chart of PSNR values of results of different edge detectors and proposed method

REFERENCES Analysis and Machine Intelligence, Vol. Pami-8,


No. 6, November 1986.
[1] John Canny, A Computational Approach to [2] Raman Maini and Himanshu Aggarwal, Study
Edge Detection, IEEE Transactions on Pattern and Comparison of Various Image Edge
Detection Techniques, International Journal of

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
85
Image Processing (IJIP), Vol. 3, no. 1, pp. 1 12, [20] F Qadir, J Shah, M A Peer, K A Khan,
2000. Replacement of graphics translations with two
[3] Pinaki Pratim Acharjya, Ritaban Das and dimensional cellular automata, twenty five
Dibyendu Ghoshal, Study and Comparison of neighborhood model, International Journal of
Different Edge Detectors for Image Computational Engineering and Management, Vol.
Segmentation, Global Journal of Computer 15, Issue 4, pp. 33-39, 2012.
Science and Technology Graphics & Vision, Vol.
12, no. 13, pp. 28-32, 2012. [21] A R Khan, Replacement of some Graphics
[4] S Ulam, Some Ideas and Prospects in Routines with the help of 2D Cellular Automata
Biomathematics, Annual Review of Biophysics Algorithms for Faster Graphics Operations, PhD
and Bioengineering, pp. 277-292, 1963. thesis, University of Kashmir, 1998.
[5] J V Neumann, Theory of Self-Reproducing
Automata, University of Illinois Press, 1966. [22] M Mraz, N Zimic, I Lapanja, I Bajec, Fuzzy
[6] S Wolfram, Computation Theory of Cellular Cellular Automata: From Theory to Applications,
Automata, Communications in Mathematical IEEE, 2000.
Physics, vol. 96, pp. 15-57, 1984.
[7] S Wolfram, A new kind of science, Wolfram [23]W Hong, Z Hong-jie and W Hua, Image
Media, Inc., 2002. Segmentation Arithmetic Based on Fuzzy Cellular
[8] S Wolfram, Cellular Automata and Complexity, Automata, Fuzzy Systems and Mathematics, no. 18,
Collected Papers Reading, MA: pp. 309-313, 2004.
Addison-Wesley, 1994.
[9] S Vijayarani, A Sakila, A Performance [24]D K Patel and S A More, Edge Detection
Comparison of Edge Detection Techniques for technique by Fuzzy logic and Cellular Learning
Printed and Handwritten Document Images, Automata using fuzzy image processing, IEEE Conf.
International Journal of Innovative Research in (ICCCI), pp. 1-6, 2013.
Computer and Communication Engineering, Vol.
4, Issue 5, May 2016. [25]http://dx.doi.org/10.17632/hvtmfvbxtj.1#file-7dd2
[10] Norman H Packard, Two-Dimensional Cellular 224d-0376-40ca-9a0d-27455a10e503
Automata, Journal of Statistical Physics, vol. 38,
pp. 901-946, March 1985.
[11] Abdel Latif Abu Dalhoum, et al., Digital Image
Scrambling Using 2D Cellular Automata, IEEE
Computer Society, October- December 2012.
[12] Radu V. Craiu and Thomas C. M. Lee, Pattern
Generation Using Likelihood Inference for
Cellular Automata, IEEE Transactions on Image
Processing, Vol. 15, No. 7, July 2006.
[13] Ingo Kusch and Mario Markus, Mollusc Shell
Pigmentation: Cellular Automaton Simulations
and Evidence for Undecidability, Journal of
Theoretical Biology, Vol. 178, Issue 3, Pages
333-340, 7 February 1996.
[14] R C Gonzalez and R E Woods, Digital Image
Processing. Second Edition, Prentice- Hall,
2002.
[15] D R Nayak, S K Sahu, J Mohammed, A Cellular
Automata Based Optimal Edge Detection
Technique using Twenty-Five Neighborhood
Model, IJCA, vol. 84, no. 10, pp. 27-33, 2013.
[16] P P Choudhury, B K Nayak, S Sahoo, S P Rath,
Theory and Applications of Two-dimensional,
Null boundary, Nine neighborhood, Cellular
Automata Linear rules, IJCA, 2008.
[17] A M Odlyzko, O Martin, S Wolfram, Algebraic
properties of cellular automata,
Communications in Mathematical Physics, vol.
93, pp. 219258, 1984.
[18] M Sipper, The evolution of parallel cellular
machines toward evol-ware, BioSystems, vol.
42, pp. 29 43, 1997.
[19] Deepak Ranjan Nayak, Prashanta Kumar
Patra and Amitav Mahapatra, A Survey on Two
Dimensional Cellular Automata and Its Application in
Image Processing, IJCA Proceedings on
International Conference on Emergent Trends in
Computing and Communication, 2014.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
86
Application of Elliptic Curve Cryptography for Mobile and
Handheld devices
[1]
Ajithkumar Vyasarao, [2] K Satyanarayan Reddy,
[1]
Research Scholar Regional Research Centre VTU Belgaum, [2] Professor and Head of
the Department Information Science & Engineering, Cambridge Institute of Technology,

Abstract: Voice communication becoming cheaper because of stiff competition among the service
providers. Majority of the Internet Traffic consists of digitized voice and multimedia. Security is very
much required for voice communication in order to safeguard privacy. Security always comes with
premium such as computational power, memory and power.

Index Terms: Discrete Logarithm Problem, Elliptic Curve Cryptography, Point addition,
Public Key Cryptography, Symmetric Key Cryptography, Voice Security

can place call from


I. INTRODUCTION i. Mobile phone to fixed line Phones.
Internet considered to be the most important ii. Mobile phone to anther mobile phone,
invention of 20th century. Internet in one way or iii. Mobile phone to Personal Computer and
other way impacting our day to day transactions. iv. Personal Computer to Personal
Internet facilitates many applications such as Computer
Voice over IP, Internet Banking, Online This can lead to delay, jitter and packet loss,
shopping. All such applications are now being which in turn affect Quality of Service(QOS).
accessed using smart phones and hand-held Such issues can be addressed by proper
QOS configuration on networking devices
devices. Internet applications are vulnerable to
such as switch, router and gateway. Internet
many attacks. There is a need for providing
can pose lot of challenges for VOIP
security to defend against such attacks.
communication for various reasons. Data is
transmitted without encryption over TCP/IP
II. VOIP SECURITY network. An attacker can launch attack
Voice over IP technology deals with without revealing his identity, personal and
transmission of digitized voice data over geographical as well. Voice data sent over
Internet. Internet technology now being used Internet is vulnerable to all sorts of attacks
extensively for communication, capable of such as spoofing, sniffing. By default, VOIP
transporting voice, data and multimedia traffic. traffic over the Internet is sent in unencrypted
form. This will open the door for
Eavesdropping attack. Attacker can also
launch DOS attack by flooding VOIP server
with large number of inauthentic
packets. Attacker can launch replay attack,
such as spamming huge voice data to VOIP
phones or voice mail box. Voice traffic
composed of control traffic, signaling and
media communications. Based on the
protocols, VOIP communication can use
single channel or multiple channels.
Typically, these channels are Internet
connections between two end points.
Securing VOIP communication over IP
network connections. Securing VOIP
communication over IP network connections
Fig 1. Voice Over IP1 implemented in terms of authentication and
encryption.

III. SECURITY PROTOCOLS


Voice over IP supports many combinations, Security requirements can be categorized as
one i. Confidentiality
ii. Integrity

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
87
iii. Authentication private key system for encryption and
Confidentiality can be assured by encrypting decryption; RSA which belongs to public key
the data. An intruder will not be able to system is a good candidate,
decrypt the data, without having access to the
key for decrypting the data. Data Integrity can
be implemented by adding some hash at the however RSA is not best candidate for mobile
source and transmitted along with the data, at and handheld devices which have limited
the receiving end. Data integrity is ensured if computational power and memory. Elliptic
there is match between generated hash and Curve Cryptography is a good candidate for
received hash. Authentication ensures data is mobile and handheld devices. We are
received from the right source. Cryptography proposing ECC for voice encryption, here we
which deals with encryption and decryption of use ECC for key exchange and AES for
data can be broadly classified into two major encryption and decryption.
categories
i. Public Key Encryption or Asymmetric Key IV. ELLIPTIC CURVE CRYPTOGRAPHY
Encryption ECC was discovered in 1985 by Neil Kibitz
ii. Symmetric Key Encryption and Victor Miller. ECC schemes are
Public Crypto System uses key-pairs for public-key mechanisms that provide the same
encryption and decryption, whereas private functionality of RSA. ECC belongs to public
crypto system uses same key for both key cryptosystem category, which is based on
encryption and decryption. Public Key crypto Elliptic Curve Discrete Logarithm Problem for
system uses more number of CPU cycles and its security.
memory. This requirement may not be
suitable for mobile and hand-held devices as
they are limited by computational power,
memory and energy or battery power. 2.https://docs.oracle.com/cd/E19656-01/821-15
Cryptographic algorithm should be energy 07/aakfv/index.html.
efficient so that system can last for long time. ECC is serving as an alternative to RSA by
providing highest strength per-bit security
compared to other prevalent cryptosystems
existing today. ECC-160 provides security
compared with RSA-1024 and ECC-224
provides security compared with RSA-2048
[1]. Elliptic Curve Cryptography is such a
powerful cryptosystem, which uses only 1/6
key size of RSA to guarantee the equivalent
security [2]. ECC uses shorter key lengths and
Fig 2. Public Key Encryption2 provides security equivalent to RSA. This
Private key crypt system uses same key for feature makes ECC very attractive for mobile
encryption and decryption. Since same key is hand-held devices.
used for both encrypting the data and
decrypting the data, this approach consumes A. Mathematical background
less number of CPU resources compared to An elliptic curve shown in Figure 4 can
public key crypto system. The major be represented as the set of solutions for the
challenge is how to exchange secret key over equation
public infrastructure. y2=x3+ax+b(mod p) (1),
where a, b belongs Zp such that 4a3 +27b2 #0,
including the point of infinity. Efficiency of
elliptic curve algorithm is based on various
factors like selecting the finite filed which could
be either prime or binary, elliptic curve
arithmetic such as point addition and point
multiplication, scalar representation [3].
Algorithms are evaluated against two
parameters such as time complexity and
Fig 3. Symmetric Key Encryption2 space complexity. An algorithm is considered
to be complex if it takes more time to solve
Modern cryptography uses combination of mathematical problem. The security of ECC is
public and private key cryptosystem. Public attributed to difficulty of solving discrete
key crypto system for key exchange and logarithm problem over the points on an elliptic

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
88
curve, which is popularly known as Elliptic
Curve Discrete Logarithm Problem (ECDLP).
To give one example, the best-known method
to solve ECDLP (pollard's rho algorithm) is
fully exponential and substantially smaller key
sizes as compared to other public
cryptosystems to provide equivalent security
[4].

Fig 5. The Elliptic curve point addition


Another important operation defined in elliptic
curve cryptography is point doubling. This
can we considered as special case of point of
addition where P=Q. Since we are going to
add point P to itself, we dont have another
distinct point on the elliptic curve to draw a line
joining P to Q. In this scenario, a tangent to
the elliptic curve is drawn keeping P as the
starting. Tangent is extended and it intersects
at another point which is considered to be R,
now take reflection on x-axis to obtain R. Now
R=P+P = 2P, which is nothing but result of
Fig 4. The Elliptic curve
doubling of point P.
The Elliptic Curve Discrete Logarithm
Problem can be stated as follows. P and Q are
two points on an elliptic curve and kP
represents the point added to it self k times,
where k is a scalar such that kP = Q. For a
given P and Q, it is computationally infeasible
to obtain k, if k is sufficiently large, k is the
discrete logarithm of Q to the base P. ECC
has certain characteristics that enables the
process of taking any two points on a specific
curve. Adding these 2 points results in
another point on the same curve. There is
inherent difficulty finding which 2 points have
been used to arrive at the third point. This
property is very much useful in cryptography
[5]. Operations that are defined in elliptic
Fig 6. The Elliptic curve point doubling3
curve cryptography are point addition which is
shown in Figure 6, point multiplication and
point doubling. Elliptic Curves have certain
geometrical properties. Elliptic Curves 3.https://koclab.cs.ucsb.edu/teaching/cren/do
symmetry over x-axis. If we take the reflection cs/w03/09-ecc.pdf
over the x-axis, we get another half of the Extending this point doubling operation, we
elliptic curve. Point addition operation is can perform another operation, which is
defined over the elliptic curve. Take two points nothing but Elliptic curve point multiplication.
P and Q on the elliptic curve. Draw a line Point multiplication is the operation of
joining P and Q, extend this line so that it successively adding a point along an elliptic
touches another point on the elliptic curve-R, curve to itself repeatedly.
now take the reflection of R on the x axis that
is R on the elliptic curve. Now R is the result of The Elliptic curve point multiplication is also
point addition P with Q. referred as scalar multiplication, thus the most
common name is elliptic curve scalar
multiplication. Scalar multiplication is denoted
as nP = P + P + + P for some integer n and
a point P = (x, y) that lies on the elliptic curve,

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
89
E. There is one special case of point addition V. KEY AGREEMENT
operation in elliptic curve. In this case point P Key agreement protocols play very important
is added to P, which is nothing but reflection role for ensuring secure communication over
of point P on x-axis, when added P with P by insecure network. In voice communication,
joining two points with a straight line, this unless and otherwise specified, it refers to
resulting line will not intersect elliptic curve at communication between two entities. In this
another point, as this line is parallel to y-axis. paper, we are referring only unicast voice
communication scenario, excluding multicast
and broadcast communication cases. A key
establishment protocol allows two or more
parties to establish a shared secret key for
encrypted communication over unsecure
network]. A two-party key agreement protocol
facilitates establishment of common key
between two communicating entities. Both
entities contribute some information to generate
the shared session key. Diffie-Hellman
proposed first key agreement protocol, which is
considered to be original break-through in
public-key cryptography. However,
Diffie-Hellman protocol is susceptible to
man-in-the-middle attack as there is no
Fig 7. The Elliptic curve point addition special mechanism to authenticate two entities
case3 participating in the secure communication.
Basic requirement of key agreement protocol is
When point, P is added with P the line to ensure session key is established only
intersects elliptic curve at the point of infinity, between the intended parties to the
which is denoted as . Point addition with communication. The desirable characteristics of
point of infinity gives back to same point. In two-party key agreement protocol includes
other words, P + = + P = P. Point of infinity known key security, perfect forward secrecy, key
serve as identity element with respect to compromise impersonation, unknown key
group operation addition. share, implicit key authentication, key
confirmation and explicit key authentication
B. Elliptic Curve Discrete Logarithm Problem which needs to be satisfied while designing a
Given a point P and Q is obtained by protocol for efficacy [5].
multiplying P by a scalar integer d. Given P
and Q it is difficult to derive integer d. Adding VI. ANALYSIS OF ELLIPTIC CURVE CRYPTOGRAPHY
d times P. In other words, P + P ++ P = dP =
Q. Choosing large d will make attacker job A. Diffie-Hellman Key Exchange variant for ECC
hard. If d is known, we need efficient Diffie-Hellman key exchange algorithm is
algorithm to compute dP. One such algorithm used for exchanging the secret key over
is double and add. Strength of any insecure channel. Let us take a scenario in
cryptographic algorithm is measured against which Alice and Bob two parties need to
hardness or effort required to break the key. exchange secret key. Elliptic Curve
Effort required to break the key is proportional Cryptography provides a way to exchange
to key length. In other words, 128-bit key secret key.
provides higher security compared to 64-bit
key. Advantage of using Elliptic Curve i. Alice and Bob agree upon starting point P
Cryptography is ECC provides better security point on elliptic curve publicly defined
with smaller key size compared to its other y2 = x3 - 4x + 0.67
public key cryptographic algorithms like RSA. ii. Alice selects his private and computes
Elliptic Curve Cryptography can be used for P shares this with bob
exchanging the secret key, encryption and iii. Bob selects his private and computes
decryption of data. Diffie-Hellman proposed P shares with Alice
first key exchange protocol. A variant of iv. Alice receives P and computes P by
Diffie-Hellman key exchange algorithm can multiplying with his private
be implemented using ECC. v. Bob receives P and computes P by
multiplying with his private
It is obvious P = P, hence both Alice and
Bob have same key which serves as private key
for further encryption and decryption.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
90
[5] Kavitha Ammayappan, Atul Negi, V. N.
Sastry and Ashok Kumar Das, An
ECC-Based Two-Party Authenticated Key
B. Security provided by ECC Agreement Protocol for Mobile Ad Hoc
ECC provides better security against attacks Networks, Journal of Computers, Vol. 6,
like factoring attacks. Given Q=dP, it is difficult NO. 11, November 2011.
to derive secret d for a given Q and P. There are
some algorithms used to attack ECC such as
Baby-Step Giant-Step and Pollard- Rho
method.
Complexity of such methods are
approximately p. An elliptic curve using a
prime d with 160 bit approximately results in
2160 points, an attacker need at least 280 steps
on an average, another value for d with 256 bits
generates 2256 points and provides security
of 2128 steps on an average to break the system.

VII. CONCLUSION
Elliptic Curve Cryptography finding new
applications where computational power and
memory are major factor. Elliptic Curve
Cryptography can be used for efficient key
exchange between the end points. Elliptic Curve
Cryptography can also be used authenticating
the end points in terms of digital signature Once
has to choose the right Elliptic Curve to provide
better performance and desired level of security
based on the mobile and handheld device
requirements. This work can be extended by
choosing right curves for different VOIP end
points and analyzing the performance, so that it
can be fine-tuned.

REFERENCES
[1] Luma A, Ameti L. ECC secured voice
transmitter.
Proceedings of the World Congress on Eng
ineering 2014
[2] Park HA. Secure chip based encrypted
search protocol in mobile office
environments. International Journal of
Advanced Computer Research. 2016;
6(24):72.
[3] Karthikeyan E. Survey of elliptic curve
scalar multiplication
algorithms. International Journal of
Advanced Networking
and Applications. 2012; 4(02):1581-90.
[4] Kalra S, Sood SK. Elliptic curve
cryptography: Survey and its Security
Applications. In proceedings of the
International Conference on Advances in
Computing and Artificial Intelligence 2011
(pp. 102-6). ACM.

Proceedings of IIRAJ International Conference (ICCI-SEM-2K17), GIFT, Bhubaneswar, India, 18th - 19th February 2017, ISBN: 978-93-86352-38-5
91

Вам также может понравиться