Вы находитесь на странице: 1из 11

138

International Journal of Communication Networks and Information Security (IJCNIS)

Vol. 3, No. 2, August 2011

The Analysis and Identification of P2P Botnets


Traffic Flows
Wernhuar Tarng1, Li-Zhong Den1, Kuo-Liang Ou1 and Mingteh Chen2
1

National Hsinchu University of Education, 521 Nanda Rd., Hsinchu, Taiwan, ROC

Micrel Semiconductor Inc. 2180 Fortune Drive, San Jose, CA 95131, USA

Abstract: As the advance of information and communication


technologies, the Internet has become an integral part of human
life. Although it can provide us with many convenient services,
there also exist some potential risks for its users. For example,
hackers may try to steal confidential data for illegal benefits, and
they use a variety of methods to achieve the goal of attacks, e.g.,
Distributed Denial of Service (DDoS), Spam and Trojan. These
methods require a large number of computers; hence, hackers often
spread out malicious software to infect those computers with lower
defense mechanisms. The infected computers will become the
zombie computers in the botnets controlled by hackers. Thus, it is
an important subject regarding network security to detect and
defend against the botnets. Among them, the Peer-to-Peer (P2P)
botnet is a new type of botnets with every zombie computer as a
peer controlled by hackers and thus its defense is more difficult.
The objective of this study is to identify the traffic flows produced
by known or unknown malicious software for defending against
P2P botnets. Based on the analysis of P2P networks traffic flows
and the ASCII distribution in their packets, a mechanism containing
six steps was proposed to identify the traffic flows of P2P botnets
for locating the zombie computers, and finally restrain the
computers from further infection.
Keywords: P2P botnets, network traffic flows, network security,
decision-tree model.

1. Introduction
With the advance and development of information and
communication technologies (ICT), computer networks have
become an integral part of human life. Its applications range
from online news, online shopping and the use of Google
search to acquire information, online ATM and stock
trading. In the open network environments, there are always
some unscrupulous criminals or organizations trying to use
various methods to steal or destroy personal data in order to
obtain illegal benefits. Usually, the hackers will attempt to
infect a large number of computers lacking or without
protection using malicious software to form the so-called
botnets, and then achieve their purposes by the attacks of
zombie computers through the botnets. The methods that
often used for attacks include: Distributed Denial of Service
(DDoS), Spam, Click Fraud and Information Leakage.
The first botnet appeared in 1993 in the Internet Relay
Chat (IRC) networks, and became wide-spreading after
1999. In New Zealand, a 19-year-old hacker controlled 150
million computers through the Internet, which is the largest
known botnet; another Chinese hacker controlled 60,000
computers to attack a music website, causing the website out
of service even with its server being transferred to Taiwan or
the USA. The two events caused the loss of hundreds of
million dollars [1], and the two hackers were finally arrested.
Waledac [2] is one of the top 10 botnets in the USA,

affecting at least hundreds of thousands of personal


computers in the world, and it can send 1.5 billion spam
email messages daily, enough to seriously affect the global
network activities. According to Microsofts statistics, there
were as many as 650 million malicious spam emails sent to
Hotmail from December 3 to 21, 2009. There were at least
233 source IP addresses in Taiwan involved in sending spam
emails for the Waledac botnets during early May 2009,
showing that botnets could really influence the global
computer networks.
Today, the Internet is widely used for communication,
multimedia, shopping, entertainment, research, education,
and so on, and it is continuously extending its application
areas. In the open network environments, the computers
connected to the Internet are vulnerable and subject to
different kinds of attacks. Even with the antivirus software
installed on the computers and frequently updated, it is still
possible to be infected. Due to the neglect of its user and fast
mutation of computer virus, a computer has a great chance to
be infected and become the zombie computer. According to
Symantecs global Internet security report [3], Taipei has
become the city with the worlds highest density of botnet
viruses. Up to 80% of the computers may have been
infected, and, what is worse, the users may still be unaware
of it. Thus, the prevention of malicious attacks can not
simply rely on antivirus software. Sometimes, it is required
to use some efficient mechanisms to detect and defend
against the botnets.
A botnet is a collection of software agents, or robots, that
run autonomously and automatically [4]. The term is most
commonly associated with IRC botnets and more recently
malicious software, but it can also refer to a computer
network using distributed computation software. Botnets are
usually named after its malicious software, such as Peacomm
and Waledac. Basically, the composition of a botnet
includes: the server programs used to control the infected
computers, the client programs installed on the infected
computers waiting for the control instructions, and the
malicious software to infect normal computers to become
zombie computers. The above programs often use a unique
encryption system to communicate with each other to
prevent from being detected and they are running in the
background of infected computers using an exchange
channel (e.g., the RFC1459 standard, Twitter) to
communicate with its command and control server. The new
robot can automatically scan its environment and use the
weakness of passwords to infect other computers. When a
robot is capable of infecting more computers, it is more
valuable in the botnets controlled by the hackers.
Based on the ways of connection between the hackers and

139
International Journal of Communication Networks and Information Security (IJCNIS)

zombie computers, there are three types of botnets, i.e. IRC,


HTTP and P2P botnets. In the first type of botnets, an
infected computer is automatically connected to the IRC chat
room controlled by the hackers and waits for the next
operational command. Hackers can also set up their own IRC
servers or use the public IRC servers to exchange messages
with zombie computers. The architecture of HTTP botnets is
similar to that of IRC botnets, mainly launching attacks
through malicious HTTP servers set up by the hackers.
IRC and HTTP botnets use the client-server architecture
and thus have the feature of single point of failure, which
means the entire botnet will collapse once the server has
been shot down. Therefore, the P2P botnet was proposed by
hackers as a new architecture using P2P communication
protocols. In a P2P botnet, any zombie computer can be a
client or a server, and it connects to the botnet according to
its peer list to from a reciprocal relationship within the
network topology. Therefore, a P2P botnet doesnt need any
particular server to download programs or receive
instructions; the hackers can launch attacks from any
computer in the P2P botnet. Consequently, the detection and
prevention of P2P botnets are more difficult and challenging.
In recent years, the research on botnets has become an
important issue. According to the study of Zhu et al. [5],
current research about botnets can be divided into three main
areas: (a) the investigation of botnets by structural analysis
or observing their operation, (b) detecting and tracking
botnets, and (c) defending against the attacks of botnets. The
above study was focused on the IRC protocols of botnets.
Currently, most detection mechanisms for P2P botnets are
designed to detect a single type of P2P botnets, so they
couldnt be applied to other types of P2P botnets. To remedy
this drawback, Liu [6] proposed an adaptive defense
mechanism for a variety of P2P botnets, but it can only be
applied in the stage when a botnet is launching attacks.
Karasaridis et al. [7] tried to detect the P2P botnets attacks
by their traffic flows, such as DDoS, and Spam, and, through
the traffic analysis, to identify possible connections with the
command and control server and to track its location. Goebel
and Holz [8] found the infected computers connecting to an
IRC botnet often had nicknames different from that of a
normal computer; therefore, they could identify an IRC
botnet through the traffic analysis of these computers with
special nicknames.
Lu et al. [9] considered the future botnets will be attached
to existing network applications (e.g., IRC, HTTP and P2P)
as well as some other unknown applications for making
attacks, so they suggested using the characteristics and
behavior of traffic flows to find out what kind of
applications are attached and then identifying the botnets
through the classification of traffic flows based on a
decision-tree model. Their study was focused on IRC botnets
only and it didnt address the issue of P2P and HTTP
botnets. In their approach, the characteristics of traffic flows
were determined based on the payload, or ASCII (0-255)
distribution, of the traffic flows per unit time (1 second). In
order to reduce the complexity of identification processes,
string comparisons were used to identify the packets of some
recognized applications in advance. However, their approach
could increase the identification time and thus affect the

Vol. 3, No. 2, August 2011

overall efficiency. This study improved the above approach


by filtering out the unwanted P2P and non-P2P packets to
reduce the time identification processes. Then, it used the
decision-tree model trained by known P2P traffic flows to
further increase the identification rate.
A decision tree is a classification procedure to assign a
number of objects to the predefined categories. In the
classification process, data are collected and divided into
several homogeneous subsets recursively. The decision tree
consists of the root, intermediate nodes, and end nodes. The
root forms the base of all information, so it doesnt have any
input but can have zero or several outputs; an intermediate
node is a partitioned data set, which can have two or more
input and output; an end node, or leaf node, has one input
and no output. The J48 decision tree used in this study is an
improved decision tree based on Quinlans C4.5 decision
tree [10], and it expands the tree structure, starting from the
root to the end nodes, for better understanding the rules
generated.
In this study, the detection of P2P botnets was done by
identifying their traffic flows to locate the zombie computers
and finally restrain other computers from further infection.
At first, the packets sending from the source ports to the
destination ports by the computers in the network were
filtered, which could help understand the current status of
the network. Also, the information obtained from these
packets could be used to identify the traffic flows of P2P
botnets. The mechanism proposed in this study for
identifying P2P botnets contains the following six steps:
z Pre-processing stage: filtering out non-P2P traffic flows
to simplify the identification process.
z Identification of P2P application hosts: identifying the
hosts running P2P application programs.
z Identification of P2P applications traffic flows:
analyzing the traffic flows produced by P2P application
hosts in the communication stage to determine if they
belong to some P2P applications.
z Classification of P2P applications: determining if the
traffic flows were produced by some P2P application
programs based on the analysis of payload
characteristics.
z Detection of abnormal traffic flows: classifying the
traffic flows of P2P application programs into two groups
to detect the abnormal traffic flows produced by some
unknown P2P botnets.
z Detection of zombie computers: locating the zombie
computers according to the information from the analysis
of traffic flows produced by P2P botnets.
The objective of this study is to detect the traffic flows of
P2P botnets quickly during the communication stage. The
Response to Intervention (RTI) method [11] was adopted to
observe the traffic flows of normal P2P applications and P2P
botnets. Then, the traffic flows were classified into several
groups by a trained decision-tree model, and the information
obtained were used to identify the abnormal traffic flows and
locate the zombie computers. In order to capture, filter and
analyze the packets, this study used VMware (installed on
WindowsXP SP2) and network management tools
(WireShark and CurrPorts) to observe the networks traffic
flows.

140
International Journal of Communication Networks and Information Security (IJCNIS)

2. P2P Traffic Analysis


The nodes in a P2P network are usually connected through
an ad hoc network [12], and the main idea is to form a
logical network through the existing physical network, rather
than reconstructing a new physical network. No matter what
kind of logical network structure is selected, the clients still
have to transfer data through the physical layer. When two
computers are communicating with each other by the
network protocols such as BitTorrent for data transmission,
the protocols will first estimate the available bandwidth and
computation power on both computers to see if it is feasible
before making the connection and data transmission. The
two computers can be either a server to download data or a
client to upload data. They are equally reciprocal to each
other and there exists no obvious client-server architectures.
Currently, there are many application programs using P2P
technologies for information sharing, e.g., eDonkey, Foxy,
BitTorrent and GoGoBox.
2.1 Characteristics of P2P Applications Traffic Flows
To transfer files over P2P networks, users need to install P2P
application programs on their computers. When the function
of file download or upload is used, the computer will send
out a large number of IP packets to establish connections
with a list of P2P peers within a short time. More computers
on the lists of other peers will join the connections, so the
connected peers continue to change on the fly. The computer
continues to work with these peers until the file transfer is
completed. Since the computers on the peer list may not be
online, not all connection packets receive response. Figure 1
shows the users computer is connecting to a P2P network
using the software BitComet (its network protocol being
BitTorrent). The computer sends a large number of UDP
packets to several IP addresses before establishing the
connection. In this study, it is defined as the communication
stage, and the size of UDP packets in this stage is usually
very small (Figure 2).

Figure 1. Users computer connecting to a P2P network

Figure 2. Size of UDP packets during communication stage


UDP isnt a reliable or connection-oriented
communication protocol. Its packet format (Figure 3)
includes the source port, destination port, packet length,

Vol. 3, No. 2, August 2011

checksum and data. This study analyzed the characteristics


of P2P traffic flows based on the ASCII (0-255) distribution
in the data field.

Figure 3.The format of UDP packets


P2P application programs typically use UDP protocols to
establish connections during the communication stage, e.g.,
eDonkey, Foxy, BitTorrent, GoGoBox and other malicious
software. However, not all P2P application programs use
UDP protocols to conduct a file transfer. For example,
GoGoBox uses TCP packets to initiate a reliable connection
by three-way handshaking directly (Figure 4). TCP is a
connection-oriented and reliable transmission protocol with
lower transmission speeds, and its packet format is shown in
Figure 5. When GoGoBox is establishing a connection, the
computer will send out the packets with PSH=1 and ACK=1
to the P2P network for communication, so the data field of
the packets can be retrieved for identification.

Figure 4. The communication stage of GoGobox

Figure 5. The format of TCP packets


After the communication stage, the computer can proceed
with file download/upload, which is defined as transmission
stage in this study. The packet size in this stage varies
greatly, from 60 to 1468 Bytes. Cho [13] used this
information to detect P2P traffic flows and the accuracy was
very high. Based on the above analysis, the normal behavior
of P2P applications typically contains two stages: (a)
communication stage: UDP packets are mainly used to
establish connections in P2P networks; the packet size and
the changes in traffic flows are small. The computer
establishes connections using three-way handshaking by
sending TCP packets with the parameters PSH=1 and
ASK=1 for communication, and (b) transmission stage: the
computer starts file download/upload through the P2P
network, so the packet size varies greatly in this stage.
2.2 Characteristics of P2P Botnets Traffic Flows
According to the adaptive defense mechanism proposed
by Liu [6], the behavior of P2P botnets can be divided into
the following four stages:
z Infection stage: inducing users to click on malicious links

141
International Journal of Communication Networks and Information Security (IJCNIS)

or open the attachments.


z Connection stage: the infected computer connecting to
the P2P botnet to receive commands and download
programs.
z Download stage: proceeding with secondary infection or
receiving commands.
z Attack stage: starting attacks or spreading spam to the
target hosts or specified computers.
Lius defense mechanism must wait until the attack stage
to detect the botnet viruses. This study proposed to detect the
viruses during the connection stage. Since the P2P botnets
behave similarly to normal P2P applications except in the
infection and attack stages. According to the previous
studies, the connection stage of P2P botnets is similar to the
communication stage of P2P applications and the download
stage of P2P botnets is similar to the transmission stage of
P2P applications. This study tried to detect P2P botnets as
early as possible, so the analysis and identification of P2P
traffic flows is performed in the connection or
communication stage.
This study investigated two different types of P2P botnets:
the first is Trojan.Peacomm, also known as the Storm worm
since it spreads quickly in a short time to form a large botnet.
It was first discovered in 2007 [14], and used the
implementation of Distributed Hash Table (DHT) in the
Kademlia P2P networks. It utilizes email attachments to
induce users to click on them, which are then executed on
the computers to connect to the botnets through the peer list
(Figure 6) to download malicious software from other
computers. Trojan.Peacomm sends UDP packets to a large
number of botnets, attempting to establish connections
during the connection stage. Because the changes of its
traffic flows are usually small (Figure 7), the behavior is
very similar to that of other P2P software.

Figure 6. The connection stage of Trojan.Peacomm

Figure 7. The packet size of Trojan.Peacomm


The second type of P2P botnets under investigation is
Waledac, which uses a conection mechanism different from
that of Trojan.Peacomm. Waledac establishes connections
mainly through TCP packets (Figure 8), and it uses the
packets with parameters PSH and ACK to communicate with
P2P botnets (Figure 9).

Vol. 3, No. 2, August 2011

Figure 8. The connection stage of Waledac

Figure 9. The TCP packet of Waledac


Other types of P2P botnets such as Nugache are infected
by MSN, email attachments, and Microsoft vulnerabilities
(such as MS03-026 and MS04-011). The infected computers
will connect to the botnet using TCP packets, and open TCP
Port 8 to download malicious software from other zombie
computers to perform DDoS attacks. Nugache can also steal
email addresses from the infected computers to send spam
emails. Another P2P botnet virus, Sinit, infects computers
through the IE vulnerability (Java.ByteVerify) by injecting
malicious software through web pages. A computer is
infected after browsing the web page, and then it will open
TCP/UDP Port 53 to pretend as an HTTP server. When the
infected computer receives HTTP GET requests for ks.htm
or ks.exe, it will infect other computers by replicating itself
through UDP Port 53. The main attack by Sinit is using key
loggers to steal information from the infected computers.
SpamThru infects computers by users careless operation,
e.g., clicking on malicious hyperlinks to connect to a botnet
server for downloading Kaspersky antivirus software, which
can also be used to remove other malicious software. When a
botnet server on the network is identified, the hackers can
immediately switch to another infected computer as the
server, which will download the messenger program and
send spam emails through the infected computers again.
2.3 Payload Characteristics of P2P Traffic Flows
In this study, the payload characteristics of P2P traffic flows
for several P2P application programs, including BitTorrent,
eDonkey/eMule, Foxy, and GoGoBox, and two P2P botnet
viruses, Waledac and Trojan.Peacomm, were analyzed. The
payload characteristics of P2P traffic flows within a small
unit of time (1 second) were obtained and analyzed using a
trained decision-tree model to classify the packets of
different P2P applications and P2P botnets. For example, the
patterns of communication packets by different P2P
applications are different and they may contain some special
strings, which can be used for the identification of their
traffic flows. As shown in Figure 10 to Figure 15, the

142
International Journal of Communication Networks and Information Security (IJCNIS)

Vol. 3, No. 2, August 2011

payload characteristics for the four types of P2P applications


and the two P2P botnets are different. Therefore, this study
could distinguish the traffic flows among these P2P
applications and P2P botnets using a trained decision-tree
model described in the next section.

Figure 14. Characteristics of Trojan.Peacomms traffic


flows

Figure 10. Characteristics of BitTorrents traffic flows

Figure 15. Characteristics of Waledacs traffic flows

3. Adaptive Mechanisms
Figure 11. Characteristics of eDonkey/eMules traffic
flows

Figure 12. Characteristics of Foxys traffic flows

Figure 13. Characteristics of GoGoBoxs traffic flows

In this study, the identification of P2P botnets traffic flows


is divided into six steps: (a) pre-processing stage (b)
identifying P2P application hosts (c) identifying P2P
applications traffic flows (d) classifying P2P applications
(e) detecting abnormal P2P traffic flows (f) identifying
zombie computers. Among them, the first three steps are
based on RTI method to detect all P2P traffic flows
according to the payload characteristics of P2P applications
and P2P botnets. The last three steps are used to classify P2P
applications traffic flows and to detect the abnormal traffic
flows by the infected zombie computers and to identify the
zombie computers.
3.1 Pre-processing Stage
In the pre-processing stage, the identification process can be
speeded up by filtering out non-P2P packets through the
well-known ports. The well-known ports, ranging from 0 to
1023, are those recognized and defined by the Internet
Assigned Numbers Authority (IANA), but not all of the port
numbers are defined. Although the identification of P2P
applications is not very efficient through these ports, but
they can be used to filter out some non-P2P packets to
reduce the processing time and data amount for
identification. Because this study was focused on P2P traffic
flows, it was better to filter out non-P2P packets in the preprocessing stage. For example, this study ignored the
identification of Port 80 and Port 443 because P2P
applications also communicate through these two ports. This
study used a post-association algorithm (as shown in Figure
16) to filter out non-P2P packets, which were determined
based on the following three conditions:
z If the source port and destination port are both
recognized, then the packet is not a P2P packet.

143
International Journal of Communication Networks and Information Security (IJCNIS)

z If the source port or destination port is not a recognized


port, then add the unknown port and its associated IP
address to the Port Association Table (PAT) and set the
packet as non-P2P.
z If neither the source port nor the destination port is
recognized, then check whether the IP address and the

Vol. 3, No. 2, August 2011

port number are in the PAT Table. If they are, the packet
is determined as a non-P2P packet; otherwise, it is treated
as a possible P2P packets.

Figure 16. The post-association algorithm used in the pre-processing stage


3.2 Identifying P2P Application Hosts
This step is for the identification of P2P application hosts
connected through BitTorrent, eMule, Foxy and other P2P
application programs. During the communication stage, the
hosts will send a large number of UDP packets to connect
with several computers, one connection per peer, so a host

using P2P software to issue a number of communications


packets should have almost the same number of IP addresses
and port numbers. Thus, this study summarized three
characteristics of hosts using P2P software: (a)
communication packets using UDP packets, (b) the number
of connected host IP address is large, and (c) the ports for
external connections divided by the connected IP addresses

144
International Journal of Communication Networks and Information Security (IJCNIS)

is large. Figure 17 shows the algorithm for identifying P2P


application hosts, where UDP Flag means using UDP
packets or not, #dIP is the total number of IP addresses for

Vol. 3, No. 2, August 2011

external connections, #dPort is the total number of external


ports, and Ratio equals #dPort/#dIP.

Figure 17. The algorithm for identifying P2P application hosts


18, where the definitions of Ratio and #dIP are the same as
3.3 Identifying P2P Applications Traffic Flows
given in the previous step and PSW is the total differences of
After verifying the hosts with P2P applications, the next step packet sizes; the larger the PSW, the more different the
is to identify their traffic flows. In this step, the traffic flows packet size. In general, communication packets are of small
from the source port to the destination port are divided into sizes, thus the value of PSW is relatively small.
several groups for identification, and the algorithm for
identifying P2P applications traffic flows is shown in Figure

145
International Journal of Communication Networks and Information Security (IJCNIS)

Figure 18.The algorithm for identifying P2P applications traffic flows

Vol. 3, No. 2, August 2011

146
International Journal of Communication Networks and Information Security (IJCNIS)

3.4 Classifying P2P Applications


According to the characteristics obtained in the previous
step, this study classified a variety of P2P applications using
the J48 decision-tree model. The characteristics of training
samples included the type of packets (TCP or UDP) and
their ASCII distribution within one second, forming a total
of 257 features. The packets of P2P applications (e.g.,
BitTorrent, eDonkey, Foxy and GoGoBox) and Waledac
virus were collected, each containing 1000 samples with a
total of 5000 samples, and used to train the decision-tree
model (Figure 19), which was then used to classify the
traffic flows of P2P applications and P2P botnets in the
simulation experiment.

Vol. 3, No. 2, August 2011

unknown botnet viruses, not only the data were used for
training the decision tree but also the system had to initiate
the isolation procedure to prevent the network from further
infection.

4. Simulation Experiment
A simulation experiment was conducted to evaluate the
proposed mechanism for identifying the traffic flows of P2P
botnets. The experimental environment was constructed
using two VMware virtual hosts (for the implementation of
P2P botnet programs) and four computers running different
P2P application programs. The network architecture for the
experimental environment and the role of each computer are
shown in Figure 20 and Table 2. This study used CurrPorts,
Wireshark, and Weka as the tools for monitoring the
network and data analysis. CurrPorts is a software program
to monitor the connection activities in each port, allowing
users to know the connection status on a computer;
Wireshark is a program to analyze network packets to show
the detail information; Weka is a data-mining and analysis
platform where users can implement their algorithms to
obtain the information from a large number of data using a
decision tree.

Figure 19. The J48 decision-tree model for classifying P2P


applications and Waledac virus
3.5 Detecting Abnormal P2P Traffic Flows
After the classification of P2P applications, this study used a
K-Mean clustering algorithm to divide the traffic flows of
each P2P application into two groups, and then calculated
the distance between their group centers. If the distance
exceeded the standard variation of the standard value T
(Table 1), they were regarded as abnormal traffic flows and
the mechanism triggered the monitoring and processing
procedures. If the traffic flows were resulted from a certain
program but the computer didnt install the program, they
were also treated as abnormal traffic flows. The standard
value T was derived from the original training samples using
the group distance as reference data. The basic idea is that
the difference of group centers for P2P applications is
usually small.
Table I. The standard value of group distance for different
P2P applications traffic flows
P2P Applications
Standard Value T
BitTorrent
129.61
eDonkey
253.64
Foxy
60.55
GoGoBox
116.69

3.6 Identifying Zombie Computers


In this study, the infected computers can be located
according to the information obtained from the traffic flows
produced by the known P2P botnet (Waledac) and unknown
P2P botnet (Trojan.Peacomm) detected in the previous steps.
If the abnormal traffic flows were from a new P2P
application, the data could be used to train the decision-tree
model. If the traffic flows were produced from some

Figure 20. Network architecture of experimental


environment
Table II. The operating systems and rolls played by
computers
Operating
Computer
Roll
System
Executing non-P2P
Computer
Windows 7 application software (FTP
A
and HTTP)
Executing normal P2P
Computer
Windows 7 application software (Foxy
B
and eDonkey)
Executing normal P2P
Computer
Windows
application software
C
XP SP2
(BitTorrent) and P2P botnet
virus (Trojan.Peacomm)
Executing normal P2P
Computer
Windows application software
D
XP SP2
(GoGoBox) and P2P botnet
virus Waledac
Detection
Linux
Detecting the traffic flows
Server
CentOS
of P2P botnets

147
International Journal of Communication Networks and Information Security (IJCNIS)

Database
Server

9.4
Windows
Server
2008

Recording the related


information of packets

The simulation experiment was composed of a small


local area network, with several computers executing P2P
and non-P2P applications. In this study, Waledac was
regarded as a known P2P botnet virus, and its traffic flows
were used to train the decision-tree model prior to the
experiment. Trojan.Peacomm was regarded as an unknown
P2P botnet virus. This study tried to use the proposed
mechanism to identify the traffic flows of Waledac and
detect the traffic flows of the unknown P2P botnet virus,
Trojan.Peacomm, and finally locate the zombie computers.
This experiment began with capturing packets for five
minutes (300 seconds), in which 1825 traffic flows were
retrieved with a total of 20,234 packets, including those of
normal P2P applications, e.g., BitTorrent, eDonkey, Foxy
and GoGoBox, P2P botnet viruses such as Trojan.Peacomm
and Waledac, as well as non-P2P applications like FTP,
Telnet, and HTTP.
This study analyzed the characteristics of P2P traffic
flows based on their ASCII (0~255) distributions of the
packets captured from the source ports and destination
ports. The information was used to identify different types
of P2P applications and P2P botnet viruses, and finally to
locate the zombie computers.
z Pre-processing stage
This study used a post-association algorithm to filter out
most non-P2P packets, e.g., port 21 by FTP, port 25 by
SMTP and port 110 by POP3, and the number of packets
could be reduced after the pre-processing stage. However,
not all non-P2P packets can be filtered, e.g., port 80 and
port 443.
z Identifying P2P application hosts
In this stage, the computers running P2P applications are
shown in Table 3, where Computer B, C, and D are the
three hosts identified as executing P2P application
programs.
Table III. The results of identifying the P2P application
hosts
Ratio
TCPFlag
UDPFlag
#IP
Computer A
0.34
1
1
6
Computer B
0.91
1
1
32
Computer C
0.98
1
1
21
Computer D
0.92
1
0
23
z Identifying P2P applications traffic flows
In this step, the numbers of traffic flows by P2P
applications were identified according to their packet sizes,
and the results were obtained as 776, 499, and 193 on
Computer B, C, and D, respectively.
z Classifying P2P applications
Using the trained decision tree model, the traffic flows for
P2P applications (BitTorrent, eDonkey, Foxy, and
GoGoBox) and P2P botnet virus (Waledac) were classified
as shown in Table 4.

Vol. 3, No. 2, August 2011

Table IV. Classification of P2P applications and botnet viruses


Computer
Computer
Computer
B
C
D
0
499
BitTorrent
0
519
0
eDonkey
0
257
0
Foxy
0
0
0
GoGoBox
151
Waledac
0
0
42
z Detecting abnormal P2P traffic flows
In addition to the classified traffic flows of known P2P
botnet virus, each of the remaining application traffic flows
were divided into two groups using a K-Mean clustering
algorithm. The distance between two group centers was
calculated (as shown in Table 5) to see if it exceeded the
standard variation of the standard value T. The traffic flows
were considered as suspicious or a possible P2P botnet
virus when the distance exceeded T.
Table V. The distance between group centers for detecting
abnormal traffic flows
Application
Distance between
Standard
Ove
Program
group centers
Value
r
BitTorrent
149.73
129.61
Yes
eDonkey
252.43
253.64
No
Foxy
60.18
60.55
No
GoGoBox
116.21
116.23
No
The above results show that the computer running
BitTorrent contains abnormal packet flows. After
clustering, the first group contains 464 traffic flows and the
second group contains of 35 traffic flows. Usually, the
traffic flows of P2P botnet are smaller than normal P2P
traffic flows, so it is reasonable to infer that the traffic flows
in the smaller group were caused by an unknown P2P
botnet virus.
z Identifying zombie computers
Using the information obtained from the above steps to
analyze the abnormal traffic flows, Computer C was
identified as infected by an unknown P2P botnet virus, and
Computer D was infected by the known P2P botnet virus
Waledac. According to the rules of network management, it
is required to notify the network management personnel to
isolate these two computers immediately and then retrieve
the packets of the unknown P2P botnet virus as samples for
training the decision-tree model (Figure 21).

Figure 21. Adding the samples of the unknown P2P botnet


virus to the trained decision-tree model

148
International Journal of Communication Networks and Information Security (IJCNIS)

5. Conclusion and Future Work


In recent years, the research on botnets has become an
important issue in network security. Basically, there are three
types of botnets, i.e., IRC, HTTP and P2P according to their
network architectures. Currently, most studies are focused on
IRC botnets while the studies related to the other two types
of botnets are fewer. This study proposed a mechanism to
identify the traffic flows of P2P botnets quickly during the
connection stage. The mechanism used the RTI method to
observe the traffic flows of normal P2P applications and P2P
botnets. Then, the traffic flows were classified into several
groups by a trained decision-tree model, and the information
obtained can be used to identify the abnormal traffic flows
and locate the zombie computers. The simulation results
showed that it can effectively identify known and unknown
P2P botnet viruses, and then locate the infected computers
according to the traffic information.
In the future, different types of botnets may appear in
addition to the three types of botnets discussed in this paper,
so the proposed mechanism can be used as a general
approach for the analysis and identification of the traffic
flows produced by other types of botnets. In addition, it can
also be applied to detect the unknown botnet viruses and use
the samples to train the decision-tree model, which can be
used to identify and defend against a new botnet virus. Since
this study was conducted in a small network environment,
the performance of the proposed mechanism can be
enhanced through more experiments with a larger network
environment for its reliability and robustness.

References
[1] Malware Report (2007). The economic impact of
viruses, spyware, adware, botnets, and other malicious
code, Computer Economics, 2007.
[2] G. Sinclair, C. Nunnery and B. B. Kang (2009). The
Waledac protocol: the how and why, Proceedings of
the 4th International Conference on Malicious and
Unwanted Software, Montreal, Quebec, Oct. 13-14,
2009.
[3] M. Fossi, D. Turner, E. Johnson, T. Mack, T. Adams,
J. Blackbird, S. Entwisle, B. Graveland, D. McKinney,
J. Mulcahy, and C. Wueest (2010). Symantec Global
Internet Security Threat Report: Trends for 2009,
Technical Report, Symantec Corportation, April 2010.
[4] C. Schiller, J. Binkley and D. Harley (2007). Botnets:
The killer web applications, Rockland, MA: Syngress
Publishing, Feb. 2007.
[5] Z. Zhu, G. Lu, Y. Chen, Z. J. Fu, P. Roberts, and K.
Han (2008). Botnet research survey, 32nd Annual
IEEE International Computer Software and
Applications Conference, Turku, Finland, July 2008.
[6] B. W. Liu (2009). An adaptive defense mechanism
against P2P botnets, Master thesis, Department of
Information Engineering, Chung Yuan Christian
University, Chungli, Taiwan.
[7] A. Karasaridis, B. Rexroad, and D. Hoeflin (2007).
Wide-scale botnet detection and characterization,
Proceeding of USENIX Conference (HotBots07),
Cambridge, Massachuset, April 10, 2007.
[8] J. Goebel, and T. Holz (2007). Rishi: Identify bot-

[9]

[10]
[11]

[12]

[13]

[14]

Vol. 3, No. 2, August 2011

contaminated hosts by IRC nickname evaluation,


Proceeding of USENIX Conference (HotBots07),
Cambridge, Massachuset, April 10, 2007.
W. Lu, M. Tavallaee, G. Rammidi, and A. Ghorbani
(2009). BotCop: an online botnet traffic classifier,
7th Annual Conference on Communication Networks
and Services Research, Moncton, Canada, May 11-13,
2009.
J. R. Quinlan (1993). C4.5: Programs for machine
learning, San Mateo, CA: Morgan Kaufmann.
J. D. Fuchs, and L. S. Fuchs (2006). Introduction to
response to intervention: what, why, and how valid is
it? Reading Research Quarterly, February/March
2006.
S. T. Lee (2008). Design and implementation of P2P
traffic flows management system, Master thesis,
Department of Information Engineering, National Sun
Yet-Sen University, Kaohsiung, Taiwan.
F. G. Cho (2006). Detection of P2P traffic flows,
Master thesis, Department of Electronic Engineering,
National Taiwan University of Science and
Technology, Taipei, Taiwan.
P. Porras, H. Saidi, and V. Yegneswaran. A multiperspective analysis of the Storm (Peacomm) Worm,
Technical report, Computer Science Laboratory, SRI
International, October 2007.

Вам также может понравиться