Вы находитесь на странице: 1из 74

ANALYSIS AND DETECTION OF

BOTNETS
USING MACHINE LEARNING
TECHNIQUES

Candidate : G.Kirubavathi
Reg No : 71010112041

Guide : Dr.R.Anitha
Associate Professor
Department of Applied Mathematics and
Computational Sciences
PSG College of Technology
Outline
Introduction
Botnet Lifecycle
Botnet Attacks
Botnets : A study and analysis
HTTP botnet detection using HsMM model with
SNMP
MIB variables
HTTP botnet detection using Adaptive Learning
rate ML-
FF NN
Botnet detection via mining of traffic flow
characteristics
Structural analysis and detection of Android
botnets using machine learning techniques
Introduction
Virus: Self reproduce quickly in one computer
Trojan horse: Hide themselves as safe files
Worm: Propagate through internet quickly
Remote Control Software: Legal, desktop user
Botnet: Integration of all above
Introduction cont
Bot is a self propagating application that
infects vulnerable host through direct
exploitation or Trojan insertion.

A Botnet consists of a network of


compromised computers (bots) controlled
by an attacker (botmaster)
Botnets : A study and Analysis
Motivation:

A thorough analysis of botnets to understand


the behavior and the possible attacks by them.

Objective :

To systemize existing knowledge related to


botnets and botnet detection to better
understand their strengths and weaknesses.
Typical Botnet Life Cycle
1. Botmaster exploit the
vulnerability on the victim.
2. The Victim download the actual
bot binary.
3. The bot contact the C&C server
address in the executable
including resolving the DNS
name.
4. The bot joins the IRC channel
5. The botmaster sends the
commands via IRC channel
Command and Control (C&C)

Means of receiving and sending commands and


information between the botmaster and the
zombies.
Essential for operation and support of botnet
Types
Centralized

Decentralized
Centralized Model
Communication between
botmaster and zombies goes via
centralized server
Classical communication Centralized
IRC server
method IRC (Internet Relay
Chat)
Most commonly used
Simple to implement and
customize
Easiest to eliminate
Centralized model communication
topologies B1
B2

B1 B2

B7 S
B3

B5 S
B3
S S
B6

B4
B4

Star topology S B5

B1 B2
Multi-server topology

B3 B4 B5 B6

Hierarchical topology
Centralized IRC Botnet
send cmds Botmaster

IRC Server
IRC Server
IRC Servers

Problem: single point of failure


Easy to locate and take down
Decentralized Model
In Decentralized communication
no central server
zombies talking to each other

Resilient to failures
hard to discover
hard to defend
Decentralized model communication
topologies
B3
B1

B2
B4
B6

B5

Random topology
Communication mechanisms

Push- based

Pull -based
Botnet attacks
Botnet attacks
Classification of Botnet Detection
Techniques
Honey nets
Intrusion Detection System

Signature Based Anomaly Based

Host Based Network Based

Active MonitoringPassive Monitoring


Motivation
Many of the existing detection techniques
concentrate on specific botnets.
Many of the existing detection techniques cannot
detect modern botnets.
Objective:
To propose a method that can detect botnets
irrespective of their control structures.
The method should detect new botnets also.

17
Related works
S.no Author & Features used Limitations
Year
1. Livadas et.al & IRC traffic features Detect only IRC
2006 botnets
2. Masud et.al & Flow based Not effective in
2008 features detecting P2P botnets
that have encrypted
communications
3. Nogueira et.al & Mixture of IRC, HTTP Cannot detect
2010 & P2P encrypted
communications
4. Wang et.al & Behavior-based May leads to FPR
2011 features of P2P, when checking new
HTTP & IRC s/w updates
5. Kirubavathi et.al TCP connection Cannot detect P2P &
& 2012 related features IRC botnets 18
HTTP Botnet Detection using
HsMM with SNMP MIB Variables
Used Hidden semi-Markov chain Model (HsMM) to
characterize the normal network behavior of the
TCP based MIB variables as observed sequence.

Forward-backward algorithm for estimating model


parameters
Proposed System
Architecture HsMM Modeling
Forward
Train Backward
Extraction Summation
Feature Data Algorithm
of the
Reduction of the
SNMP MIB SNMP MIB
by PCA
Variables
Variables
Test Data HsMM
Model

AL
Normal L

Bot
Model Construction
Construct a HsMM to build a profile of normal MIB traffic behavior
and use this model to detect the botnet.
A HsMM can be described as
= (N, M,V, A, B, ) where
N is the size of the state space = {0,1}
V = {v0, v1, , vM-1} is the set of all visible symbols which are
nothing but the TCP-MIB variables.
M is the number of all visible symbols is the summation count of the
MIB variables
1 0
A = [aij]NXN is the state transition probability matrix
1 0

The state transition probability matrix A, Assume A=


initially, the process is normal no matter what current state is, the
process will transfer to normal state next time by probability 1.
where aij = P{next_state = j | current state = i}, where i, j
Model Construction
Cont
B = {bi(k)}, i , 1 k M, is the
distribution of visible symbols V, where bi(k)=
P{observed system behavior = vk | current
state i}
= [0, 1, 2, , N-1] is the initial state
distribution
Web-based botnet identification Accuracy
Datasets False +ve Detection Results
Rate Accuracy

Web Service 0% 100% Normal

FTP Service 0% 100% Normal

Spyeye 1.33% 98.67% Malicious


Botnet

Black energy 1.28% 98.72% Malicious


Botnet
Motivation
Many of the existing detection techniques
concentrate on specific botnets.
Many of the existing detection techniques
cannot detect modern botnets.
Objective:
To propose a method that can detect botnets
irrespective of their control structures.
The method should detect new botnets also.

24
Related works
S.no Author & Features used Limitations
Year
1. Livadas et.al & IRC traffic features Detect only IRC
2006 botnets
2. Masud et.al & Flow based Not effective in
2008 features detecting P2P botnets
that have encrypted
communications
3. Nogueira et.al & Mixture of IRC, HTTP Cannot detect
2010 & P2P encrypted
communications
4. Wang et.al & Behavior-based May leads to FPR
2011 features of P2P, when checking new
HTTP & IRC s/w updates
5. Kirubavathi et.al TCP connection Cannot detect P2P &
& 2012 related features IRC botnets 25
HTTP Botnet Detection using
Adaptive Learning Rate MLFF-NN
Recentbotnets have begun using
common protocols such as HTTP

HTTP bot communications are based on


TCP connections

TCP related features have been identified


for the detection of HTTP botnets
Proposed System
Architecture
Neural Network Classifier

Pre-processing Training NN
Set Training
Feature
Networ Normalizatio
Extractio
k n
n
Traffic

Testing NN
Eval Bot
Set Mode uate
l

Normal
Traces of different Web-based
Bonets
Bot Trace Size Packets
Family Number

Zeus-1 5.85 MB 53,220

Zeus -2 4.13 MB 37,252

Spyeye -1 25.17 MB 1,75,870

Spyeye -2 3.90 MB 35,180


Identification accuracy of web
botnet traffic profiles
Traffic Traces # neurons in # neurons in Correct
the ip layer the hidden Identificatio
layer n

Spyeye -1 6 18 99.03%

Spyeye- 2 6 18 99.02%

Zeus -1 6 18 99.01%

Zeus -2 6 18 99.04%
Performance Measures of Spyeye Botnet

Method Precision Recall F-Measure Accuracy

Decision Tree 0.968 0.931 0.949 96.5333

Random 0.968 0.934 0.950 96.667


Forest

RBF 0.976 0.927 0.950 96.5333

FF NN 0.964 0.983 0.973 99.03


ROC curve for Spyeye
Botnet
Performance Measures of Zeus Botnet

Method Precision Recall F-Measure Accuracy

Decision Tree 0.956 0.930 0.941 96.14333

Random 0.952 0.930 0.940 96.000


Forest

RBF 0.959 0.922 0.940 95.8667

FF NN 0.948 0.992 0.969 99.04


ROC curve for Zeus
Botnet
Comparison of
Performance

Method Average
Detectio
n
Accurac
y
Gu et al (2008), BotMiner Data 96.825
mining Techniques
Nogueira et al. (2010), Neural 94.9175
Networks
Adaptive Learning Neural Networks 99.025
Proposed
Motivation
Many of the existing detection techniques
concentrate on specific botnets.
Many of the existing detection techniques
cannot detect modern botnets.
Objective:
To propose a method that can detect botnets
irrespective of their control structures.
The method should detect new botnets also.

35
Related works
S.no Author & Features used Limitations
Year
1. Livadas et.al & IRC traffic features Detect only IRC
2006 botnets
2. Masud et.al & Flow based Not effective in
2008 features detecting P2P botnets
that have encrypted
communications
3. Nogueira et.al & Mixture of IRC, HTTP Cannot detect
2010 & P2P encrypted
communications
4. Wang et.al & Behavior-based May leads to FPR
2011 features of P2P, when checking new
HTTP & IRC s/w updates
5. Kirubavathi et.al TCP connection Cannot detect P2P &
& 2012 related features IRC botnets 36
Botnet detection via mining of traffic flow
characteristics
The network activities of the following bots
are analyzed:
Zeus, Spyeye, BlackEnergy, Citadel, Kelihos,
Medfos, Storm, Waledac, Skynet,
ZeroAccess ,Virut.n, Rbot, and Eldorado.
The significant features from the traffic flows
are extracted
They are used to classify the traffic as normal
or botnet.

37
Proposed System
Architecture
Training Phase Detection Phase

Internet LAN, NKN Test Traffic flows


ISOT Dataset

HTTP & IRC Botnet Benign traffic


traffic Network
flows
Datasets P2P botnet & Benign traffic Feature Extraction
collection
s and Network
flows
Analysis
Feature Extraction

Classifier Training

Trained Model

Bot Normal

38
Dataset collection
Botnet traffic flow traces Background traffic flow
traces
Bot Traces Trace Packet Traffic Traces Trace Packet
family size s types size s

Zeus Trace -1 16.24G 1,224,6 FTP Trace-1 16.4 1,65,23


B 54 service GB 1
Trace -2 11.6 1,146,7 s Trace -2 10.8 1,20,38
GB 03 GB 1
Spyeye Trace -1 14.63 1,108,6 Web Trace -1 15.4 1,45,87
GB 74 service GB 9
Trace -2 15.65 1,123,8 s Trace -2 12.5 1,34,49
GB 65 GB 8
Black- Details
Trace -1 of Botnet1,453,8
16.98 and background
Bit- traffic-1flow
Trace traces.
11.7 1,01,72
Energy GB 76 Torrent MB 9
Trace -2 13.93 1,283,7 Trace -2 27.7 1,85,79
GB 18 MB 8

39
Dataset collection cont
ISOT Botnet Protocol No. of flows
Dataset
Malicious P2P 55,904 (3.33%)
Non-malicious Hybrid 1,619,520 (96.66%)
Total 1,675,424

Description of ISOT Botnet dataset

40
Dataset collection cont
Bot Protocol Size Bot Protocol Size
family family
Conficker P2P 69 GB Rbot IRC 27 MB
ZeroAcce P2P 10.11MB Kelihos P2P 519 MB
ss
Skynet Hybrid 31.4 M Kazy P2P 50 MB
Virut.n IRC 13 MB Medfos HTTP 466 MB
Eldorado IRC 12 MB Citadel P2P 8.39 MB
Sogou HTTP 18 MB

Descriptionofpubliclyavailablebotnetdatasets

41
Dataset Analysis

ISOT botnet traffic pattern

CAIDA-Conficker traffic pattern

42
Dataset Analysis
cont

Background LNBL traffic pattern

Background traffic pattern

43
Dataset Analysis
cont
Benign
Spyeye

Virut.N
Storm

Distributions of packets in Botnet and benign traffic 44


Feature Extraction
Small _Packets

No. of small packets send and received in a flow for specified

time interval
Packet_ratio

No. of incoming and outgoing packets in a flow for specified

time interval
Initial Packet_length

Length of the first packet in a flow

Bot-response_packet ratio

Ratio of bot response packets and total packets in a flow for

specified time interval

45
Classification Techniques
Boosted Decision Tree

Nave Bayesian Classifier

Support Vector Machine

46
Experimental Setup

47
Datasets for experiments
Dataset one (D1):This dataset contains ISOT Botnet dataset
which is the combination of several existing publicly available
malicious and non-malicious datasets.

Dataset two (D2):This dataset consists of two different traffic


traces to represent botnet and benign traffic. For botnet traffic,
we have used CAIDA Conficker dataset of size 10 GB. For benign
traffic, we have utilized 4.69 GB from our laboratory traffic traces.

Dataset three (D3): It consists of both botnet and benign traces


from laboratory traffic different IRC botnet traces from Centro
University, two different from University of Georgia and two HTTP
botnets& one P2P botnet CVUT University. The total size of this
dataset is 10.05 GB.

48
Performance metrics
Accuracy
Precision
Recall
F-measure
False Positive Rate

49
Performance evaluation
The smaller time windows may fail to capture
unique traffic characteristics that only become
visible over a longer period of time.
If the time window is longer, our detection
system will take long period to make decision.
Ultimately, the selection of time window size
will be a challenging task.
Different time windows ranging from 60 to
300 s.

50
51
Comparison of Precision, Recall and F-Measures for the three
classifiers with time window of 60s

52
Comparison of Precision, Recall and F-Measures for
the three classifiers with time window of 120s

53
Comparison of Precision, Recall and F-Measures for
the three classifiers with time window of 180s

54
Detection rate

55
False Positive Rate

56
Performance comparison with
existing methods
Detection No. of No. of bot C&C Detection
Methods Features Samples Structure Accuracy
Livadas et.al 10 1 IRC 92.00
Zhao et.al 13 4 P2P, HTTP 99.10
Saad et.al 11 2 P2P 89.00
W.Lu et.al 256 2 IRC 95.00
Masud et.al 20 2 IRC 95.20
Nogueira et.al 8-16 1 IRC, P2P, HTTP 87.56
Liao et. al 12 3 P2P 92.00
Kirubavathi 6 2 HTTP 99.02
et.al
Wang et.al 6 45 IRC, P2P, HTTP 95.00
Huang et.al 34 7 IRC, P2P, HTTP 99.00

Proposed 4 14 IRC, P2P, HTTP 99.14


model

57
Statistical significance
One-way ANOVA with post-hoc test

The post-hoc test is used to compare the


performance between classifiers

The accuracy values of 10 fold cross validation for


the three classifiers with n=30 samples at 5% level
of significance.

The statistical test prove that the Naive Bayesian


classifier perform better than other two classifiers.
58
Motivation
Many of the existing detection techniques
concentrate on specific botnets.
Many of the existing detection techniques
cannot detect modern botnets.
Objective:
To propose a method that can detect botnets
irrespective of their control structures.
The method should detect new botnets also.

59
.Motivation

Part of Deutsche Telekom project:


Backup and restore users Android terminals
Remote monitoring and offline analysis of
Android application
Problem Domain
An Android OS could be attacked by hackers:
Open platform
Users will access the Internet intensively
Everyone can develop applications for Android
Related works
S.no Author & Features used Limitations
Year
1. Livadas et.al & IRC traffic features Detect only IRC
2006 botnets
2. Masud et.al & Flow based Not effective in
2008 features detecting P2P botnets
that have encrypted
communications
3. Nogueira et.al & Mixture of IRC, HTTP Cannot detect
2010 & P2P encrypted
communications
4. Wang et.al & Behavior-based May leads to FPR
2011 features of P2P, when checking new
HTTP & IRC s/w updates
5. Kirubavathi et.al TCP connection Cannot detect P2P &
& 2012 related features IRC botnets 62
Structural analysis and detection of Android
botnets using machine learning techniques
Structure of apk file
Structure of manifest file
<? Xml version 1.0 encoding
=utf-8?>
<manifest>
<uses-permission />
<permission />
<permission-tree />
<permission-group />
<instrumentation />
<uses-sdk />
<uses-feature />
<supports-screens />
<supports-gl-texture />
<application>
<activity> .. </activity>
<service> .. </service>
<receiver> ..</receiver>
<provider> ..</provider>
<uses-library />
</application>
</manifest>
Conclusion
We proposed a detection system based on mining
of traffic flow characteristics.
The system will detect bot traffic even if the
communications are encrypted.
It is a light weight system since only a fraction of
the total amount of data need to be analyzed.
Our model does not rely on any prior
knowledge about botnet structures.
It can recognize new or unknown botnets
with good accuracy and very low false
positive rate.

67
Conclusion
Botnets pose a significant and growing threat
against cyber security

It provides key platform for many cyber


crimes like DDOS, etc

As network security has become integral part


of our life and botnets have become the
most serious threat to it

It is very important to detect botnet attack


and find the solution for it
Published Papers
1. Kirubavathi G & Anitha R 2016, Botnet detection via mining of traffic
flow characteristics.Computers & Electrical Engineering Journal,vol. 50,
pp. 91-101, Impact factor 0.836. Available in Annexure I.
2. Kirubavathi G, and Anitha R 2014, Botnets: A Study and Analysis,
Proceedings of the Springer international conference on Computational
Intelligence, Cyber Security and Computational Models. Springer India,
pp. 203-214.
3. Kirubavathi Venkatesh G, Srihari V, Veeramani R, Karthikeyan RM and
Anitha R 2013, HTTP botnet detection using Hidden Semi-Markov Model
with SNMP MIB variables, International Journal of Electronic security and
digital forensics, vol.5, Nos.3/4, pp.188-200, Available in Annexure II.
4. Kirubavathi Venkatesh G and Anitha R 2012, HTTP Botnet Detection
using adaptive learning rate Multilayer Feed Forward Neural Network,
Proceedings of Workshop in Information Security Theory and Practice
WISTP12, Royal Holloway, UK, pp. 38-48.

Paper Communicated
Kirubavathi G and Anitha R Structural Analysis and Detection of
Android Botnets Using Machine Learning Techniques ,
International Journal of Information Security, Springer. (Under
Revision).
References
1. Singh K, Guntuku SC,Thakur A, Hota C. Big data analytics framework for peer-to-peer botnet
detection using random forests .Information Sciences 2014,vol.278 pp.48897.
2. Castiglione A, De Prisco R, De Santis A, Fiore U, Palmieri F.A botnet-based command and
control approach relying on swarm intelligence. Journal of Network and Computer Applications
2014, vol. 38, pp.2233.
3. Houmansadr A, Borisov N. Botmosaic : collaborative network watermark for the detection of
IRC-based botnets. Journal of Systems and Software 2013,vol.86(3), pp.70715.
4. Venkatesh GK, Srihari V, Veeramani R, Karthikeyan R, Anitha R. HTTP botnet detection using
hidden semiMarkov model with SNMP MIB variables. International Journal of Electronic Security
and Digital Forensics 2013,vol.5(3), pp.188200.
5. Han Q, Yu W, Zhang Y, Zhao Z. Modeling and evaluating of typical advanced peer-to-peer
botnet. Performance Evaluation, 2014, vol. 72, pp.115.
6. Garca S, Zunino A, Campo M. Survey on network-based botnet detection methods. Security
and Communication Networks 2014, vol.7(5), pp. 878903.
7. Kirubavathi G, Anitha R. Botnets : a study and analysis. In: Computational intelligence,
cybersecurity and computational models. Springer;2014, p.20314.
8. Livadas C, Walsh R, Lapsley D, Strayer WT. Using machine learning techniques to identify
botnet traffic. In: Proceedings of the 31st IEEEconference on local computer networks, 2006.
IEEE;2006, p.96774.
9. Zhao, D., Traore, I., Sayed, B., Lu, W., Saad, S., Ghorbani, A., & Garant, D. (2013). Botnet
detection based on traffic behavior analysis and flow intervals.Computers & Security,39, 2-16.
10. Saad, S., Traore, I., Ghorbani, A., Sayed, B., Zhao, D., Lu, W., ... & Hakimian, P. (2011, July).
Detecting P2P botnets through network behavior analysis and machine learning. InPrivacy,
Security and Trust (PST), 2011 Ninth Annual International Conference on(pp. 174-180). IEEE.
70
References cont
11. Lu, W., Rammidi, G., & Ghorbani, A. A. (2011). Clustering botnet communication traffic based on n-gram
feature selection.Computer Communications,34(3), 502-514.
12. Masud, M. M., Al-Khateeb, T., Khan, L., Thuraisingham, B., & Hamlen, K. W. (2008, October). Flow-based
identification of botnet traffic by mining multiple log files. InDistributed Framework and Applications, 2008.
DFmA 2008. First International Conference on(pp. 200-206). IEEE.
13. Nogueira, A., Salvador, P., & Blessa, F. (2010, June). A botnet detection system based on neural networks.
In2010 Fifth International Conference on Digital Telecommunications(pp. 57-62). IEEE.
14. Venkatesh, G. K., & Nadarajan, R. A. (2012). HTTP botnet detection using adaptive learning rate multilayer
feed-forward neural network. InInformation Security Theory and Practice. Security, Privacy and Trust in
Computing Systems and Ambient Intelligent Ecosystems(pp. 38-48). Springer Berlin Heidelberg.
15. Wang, K., Huang, C. Y., Lin, S. J., & Lin, Y. D. (2011). A fuzzy pattern-based filtering algorithm for botnet
detection.Computer Networks,55(15), 3275-3286.
16. Huang, C. Y. (2013). Effective bot host detection based on network failure models.Computer
Networks,57(2), 514-525.
17. Rahbarinia, B., Perdisci, R., Lanzi, A., & Li, K. (2014). Peerrush: mining for unwanted p2p traffic.Journal of
Information Security and Applications,19(3), 194-208.
18. Haddadi, F., Morgan, J., & Zincir-Heywood, A. N. (2014, May). Botnet behaviour analysis using ip flows: with
http filters using classifiers. InAdvanced Information Networking and Applications Workshops (WAINA), 2014
28th International Conference on(pp. 7-12). IEEE.
19. Garca, S., Zunino, A., & Campo, M. (2011). Botnet behavior detection using network synchronism.Privacy,
Intrusion Detection and Response: Technologies for Protecting Networks: Technologies for Protecting
Networks, 122.
20. Paul Hick D. A. , Aben Emilek cclaffy. Caida ucsd network telescope three days of conficker, available at.
http://www.caida.org/research/security/ms08-067/conficker.xml/[accessed24.12.13];2009.
21.Garca S. .Malware capture facility project, available at. http://mcfp.weebly.com/mcfp dataset.html/
[accessed03.02.14];2014.

71
References cont
P. Barford and V. Yegneswaran, An inside look at botnets, Springer Verlag, 2006.

J. Binkley and S. Singh. An algorithm for anomaly-based botnet detection, In


Proceedings of USENIX Steps to Reducing Unwanted Traffic on the Internet Workshop
(SRUTI), pages 4348, 2006.

T.Abbes, A.A.Bouhoula, and, M.Rusinowitch, Protocol Analysis in Intrusion Detection


Using Decision Tree, Proc. International Conference on Information Technology, Coding
and Computing (ITCC,04) IEEE Xplore, Pages 404-408.

Jiong Zhang, Mohammad Zulkernine, Anwar Haque: Random-Forests-Based Network


Intrusion Detection Systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C
38(5): 649-659 (2008)

Lee., J. et al The activity analysis of malicious http-based botnets using degree of periodic
repeatability. In Proceedings of the IEEE International Conference on Security Technology,
December, 2008, pp.83-86.
References cont
X. Tan and H. Xi, Hidden semi-Markov Model for anomaly detection.
Journal of Applied Mathematics and Computation, Elsevier, vol. 205, Issue
2, November 2008, Special Issue on Advanced Intelligent Computing
Theory and Methodology in Applied Mathematics and Computation, 2008,
pp.562-567.

Shun-Zheng Yu and Kobayashi, H. An Efficient Forward-Backward


Algorithm for an Explicit Duration Hidden Markov Model. In IEEE Signal
Processing Letters, vol.10, Issue 1, Jan. 2003, pp. 11-14

Wang, B., Li, Z., Li, D., Liu, F. and Chen, H. Modeling Connections Behavior
for Web-Based Bots Detection. In 2nd IEEE International Conference on e-
Business and Information System Security (EBISS) - 2010, Wuhan, pp. 1-4.

Yi Xie and Shun-Zheng Yu (2009) Monitoring the Application-Layer DDoS


Attacks for Popular Websites, In IEEE/ACM Transactions on Networking,
Vol. 17, NO. 1, Feb. 2009.
THANK YOU

Вам также может понравиться