Вы находитесь на странице: 1из 8

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO.

4, JULY/AUGUST 2011 617

Short Papers ___________________________________________________________________________________________________

Dynamics of Malware Spread in 2 RELATIONSHIP TO PRIOR WORK


Decentralized Peer-to-Peer Networks Though the initial thrust in P2P research was measurement
oriented, subsequent works, [3], [4], [5], have proposed analytical
Krishna K. Ramachandran and models for the temporal evolution of information in the network.
Biplab Sikdar, Senior Member, IEEE The focus of these works is on transfer of regular files and they do
not apply to malware that spread actively. In addition, they are
specialized to BitTorrent like networks and cannot be extended for
Abstract—In this paper, we formulate an analytical model to characterize the P2P networks such as Gnutella or KaZaa.
spread of malware in decentralized, Gnutella type peer-to-peer (P2P) networks
The issue of worms in peer-to-peer networks is addressed in [6],
and study the dynamics associated with the spread of malware. Using a
[7] using a simulation study of P2P worms and possible mitigation
compartmental model, we derive the system parameters or network conditions
under which the P2P network may reach a malware free equilibrium. The model mechanisms. Epidemiological models to study malware spread in
also evaluates the effect of control strategies like node quarantine on stifling the P2P networks are presented in [8], [9]. These studies assume that a
spread of malware. The model is then extended to consider the impact of P2P vulnerable peer can be infected by any of the infected peers in the
networks on the malware spread in networks of smart cell phones. network. This assumption is invalid since the candidates for
infecting a peer are limited to those within T T L hops away from it
Index Terms—Malware propagation, peer-to-peer networks, Internet worms and and not the entire network. Another important omission is the
viruses. incorporation of user behavior. Typically, users in a P2P network
alternate between two states: the on state, where they are
Ç connected to other peers and partake in network activities and
1 INTRODUCTION the off state wherein they are disconnected from the network.
Peers going offline result in fewer candidates for infection thereby
THE use of peer-to-peer (P2P) networks as a vehicle to spread
lowering the intensity of malware spread.
malware offers some important advantages over worms that spread
An empirical model for malware spreading in BitTorrent is
by scanning for vulnerable hosts. This is primarily due to the
developed in [10] while models for the number of infected nodes
methodology employed by the peers to search for content. For
by dynamic hit list-based malware in BitTorrent networks is
instance, in decentralized P2P architectures such as Gnutella [1]
presented in [11], [12]. However, these models ignore node
where search is done by flooding the network, a peer forwards the
query to it’s immediate neighbors and the process is repeated until a dynamics such as online-offline transitions and are applicable
specified threshold time-to-live, T T L, is reached. Here T T L is the only to BitTorrent networks.
threshold representing the number of overlay links that a search In [13], [14], the authors use hypercubes as the graph model for
query travels. A relevant example here is the Mandragore worm [2], P2P networks and derive a limiting condition on the spectral
that affected Gnutella users. Having infected a host in the network, radius of the adjacency graph, for a virus/worm to be prevalent in
the worm cloaks itself for other Gnutella users. Every time a the network. The models do not account for the fact that once a
Gnutella user searches for media files in the infected computer, the peer is infected, any susceptible peer within a T T L hop radius
virus always appears as an answer to the request, leading the user to becomes a likely candidate for a virus attack.
believe that it is the file the user searched for. The design of the In the current work, we formulate a comprehensive model for
search technique has the following implications: first, the worms can malware spread in Gnutella type P2P networks that addresses the
spread much faster, since they do not have to probe for susceptible above shortcomings. We develop the model in two stages: first, we
hosts and second, the rate of failed connections is less. Thus, rapid quantify the average number of peers within T T L hops from any
proliferation of malware can pose a serious security threat to the given peer and in the second stage incorporate the neighborhood
functioning of P2P networks. information into the final model for malware spread.
Understanding the factors affecting the malware spread can help
facilitate network designs that are resilient to attacks, ensuring 3 MALWARE PROPAGATION MODEL FOR P2P
protection of the networking infrastructure. This paper addresses
NETWORKS
this issue and develops an analytic framework for modeling the
spread of malware in P2P networks while accounting for the This section presents our framework for modeling malware spread
in P2P networks. Our model’s focus is on the propagation of
architectural, topological, and user related factors. We also model
malware and not regular files.
the impact of malware control strategies like node quarantine.
The rest of the paper is organized as follows: Section 2 presents 3.1 Search Mechanism
the related work and the analytic framework is presented in The transfer of information in a P2P network is initiated with a
Section 3. We analyze the model and study the impact of search request for it. This paper assumes that the search
quarantine in Section 4. Simulation results validating our model mechanism employed is flooding, as in Gnutella networks. In this
are presented in Section 5 and Section 6 concludes the paper. scenario, a peer searching for a file forwards a query to all its
neighbors. A peer receiving the query first responds affirmatively
if in possession of the file and then checks the TTL of the query. If
. The authors are with the Department of Electrical, Computer, and Systems this value is greater than zero, it forwards the query outwards to
Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180.
its neighbors, else, the query is discarded. In our scenario, it
E-mail: krishna.k.ramachandran@gmail.com, bsikdar@ecse.rpi.edu.
suffices to distinguish any file in the network as being either
Manuscript received 14 Apr. 2010; accepted 10 Aug. 2010; published online malware or otherwise. This is because, as noted earlier, an infected
23 Nov. 2010.
For information on obtaining reprints of this article, please send e-mail to: peer replies affirmatively to all the queries that it receives with the
tdsc@computer.org, and reference IEEECS Log Number TDSC-2010-04-0062. malware being substituted for the file being searched for. Thus to
Digital Object Identifier no. 10.1109/TDSC.2010.69. model malware spread, it is imperative to determine the average
1545-5971/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society
618 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

rate at which queries reach a node, which in turn depends on the TABLE 1
search neighborhood. Notation and P2P Model Parameters
We now use the generating function approach as in [15] to
quantify the search neighborhood. Define the generating function
for the probability mass function (pmf) of the vertex degree as
P
G0 ðxÞ ¼ 1 i
i¼0 pi x , where pi is the probability that a randomly
chosen vertex has degree i. Since the Gnutella network has a power
law degree distribution [16], we have pi ¼ Ci , where C and  are
constants. The heterogeneity of the connectivity distribution
inherent in power law distributions significantly affects the search
region of nodes with different degrees. Thus, we evaluate the
neighborhood size of a vertex as a function of its degree k.
The distribution of the degree of a vertex that we arrive at by
following an edge from a vertex is different from that of an arbitrary where z2 ¼ G000 ð1Þ and z1 ¼ G00 ð1Þ. Since the search neighborhood of
vertex in the graph. An edge arrives at a vertex with probability a peer extends up to T T L hops, the average neighborhood size is
proportional to the degree of the vertex. Thus, the probability that a given by
randomly chosen edge leads to a vertex with degree i is proportional "  #
to ipi . The pmf of the degree of the vertex can then be obtained from X
T TL
ðkÞ z1 z2 T T L
P ðkÞ
zav ¼ zi ¼ k 1 : ð3Þ
the pmf of an arbitrary vertex by normalizing it with i ipi and its z2  z1 z1
i¼1
probability generating function (pgf) is then
P
ip xi xG0 ðxÞ 3.2 Compartmental Model
Pi i ¼ 0 0 :
i ipi G0 ð1Þ We formulate our model as a compartmental model, with the peers
divided into compartments, each signifying it’s state at a time
As we follow a randomly chosen edge to reach a vertex and then instant. In addition, to account for power-law topologies, we
continue on each of the edges of that vertex, and so on, to reach all develop the compartmental model in terms of the node degree
the m-hop neighbors, the number of vertices arrived at from each
[17]. For each possible node degree k, the network is partitioned
vertex has the degree distribution above, less one power of x to
into four classes:
compensate for the edge we arrived on. The pgf of the number of
outgoing edges at each vertex is then .
ðkÞ
PS : Number of peers wishing to download a file.
ðkÞ
. PE : Number of peers currently downloading the
G00 ðxÞ
G1 ðxÞ ¼ : malware.
G00 ð1Þ ðkÞ
. PI : Number of peers with a copy of the malware.
ðkÞ
With N nodes in the network, the probability of any of these . PR : Number of peers who either have deleted the
outgoing edges connecting to the original vertex we started at or to malware or are no longer interested downloading any file.
any of its immediate neighbors falls as N 1 and can thus be Further, each class has two components: one comprising of
neglected as N ! 1. The number of 2-hop neighbors is the sum of peers of that class that are currently online, while the second
the neighbors of each 1-hop neighbor. Since the generating ðkÞ
represents the offline peers. For instance, PIon denotes the peers
function for sum of random variables is the product of the with degree k infected by the malware that are currently
individual generating functions, the pgf for the 2-hop neighbors is ðkÞ
online and PIoff , the offline infected peers. Note that since we
given by
consider networks with a finite number of nodes, the number
X of classes is finite, even with power-law topologies. We denote
pk ½G1 ðxÞk ¼ G0 ðG1 ðxÞÞ: ðkÞ
k by NP the total number of peers in the network and by NP
the total number of nodes with degree k, both online and
Similarly, the distribution of the m-hop neighbors is given by offline. Table 1 defines the parameters used in our model.
G0 ðG1 ðG1 ð   G1 ðxÞÞÞÞ, with m  1 iterations of the function G1
Our formulation is based on the principle of mass action, where
acting on itself. Now, given that a node has degree k, the pgf of
ðkÞ the behavior of each class is approximated by the mean number
its degree is given by G0 ðxÞ ¼ xk . Then, the pgf of the number of
in the class at any time instant. By employing the mean-field
m-hop neighbors of a node with degree k can be defined in terms
of the recursive convolution: approach, we make the following assumptions about the system:
8 . The number of members in a compartment is a differenti-
< xk for m ¼ 1
4
 ðG1 ðxÞÞÞÞ k for m  2: able function of time. This holds true in the event of large
GðkÞ
m ðxÞ ¼ ½G 1 ðG 1 ð  ð1Þ
: |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} compartment sizes and since P2P networks comprise of tens
m1
of thousands of users, assuming this is quite reasonable.
Differentiating the pgf and substituting x ¼ 1 yields the average . By abstracting the P2P graph through differential equa-
tions, the emphasis is more on the numbers of each class,
number of m-hop neighbors. For example, the average number
rather than the particulars of each member of the
of one and two hop neighbors of a peer with degree k are
ðkÞ ðkÞ0 ðkÞ ðkÞ
respective classes.
d
g i v e n b y z1 ¼ G0 ð1Þ ¼ k a n d z2 ¼ dx G0 ðG1 ðxÞÞjx¼1 ¼ . The spread of files in the P2P network is deterministic, i.e.,
ðkÞ0 0 00 0
G0 ð1ÞG1 ð1Þ ¼ kG0 ð1Þ=G0 ð1Þ, respectively, (the expression for the behavior is completely determined by the rules
ðkÞ
z2 uses: G1 ð1Þ ¼ 1). The average number of m-hop neighbors is governing the model. In other words, the properties of a
then class are dictated by the number of members present.
. The size of the network does not vary over the time during
  m1 which the spread of malware is modeled.
dGðmÞ  ðkÞ0 m1 z2
zðkÞ
m ¼ ¼ G ð1Þ½G 0
ð1Þ ¼ k ; ð2Þ
dx x¼1 0 1
z1 We first determine the probability that a susceptible peer
with degree k is infected when it tries to download an arbitrary
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011 619

ðkÞ
file. Following the discussion in Section 3.1, the probability that the online state. Thus, the rate at which the transitions from PSon
jp ðkÞ
ðkÞ ðkÞ
a neighbor of an arbitrary node has a degree j is given by zj ,
P into PEon occur is given by PSon ð1  ð1  pinf Þzav Þ. Now, member-
ðkÞ
with z ¼ i ipi . Now, when a query reaches a node with ship of class PSon increases if:
degree j, it is infected and responds positively to the query with
ðjÞ ðjÞ ðkÞ
probability PIon =NP . Then the probability that an arbitrary . An offline peer of class PS comes online: a transition from
ðkÞ
neighbor is infected, pinf , is given by class PSoff which occurs at rate on .
. A peer currently downloading terminates the process, say
X jpj PIðjÞ due to unsatisfactory download speeds: a transition from
pinf ¼ on
: ð4Þ ðkÞ ðkÞ
z N ðjÞ state PEon to PSon at rate r1 .
j P . A peer that previously had the file, either accidentally or
Now, a search initiated by a node with degree k, on an average, intentionally deletes the file, and wishes to download it
ðkÞ
reaches zðkÞ ðkÞ again: a transition from state PRon which occurs at rate r2 .
av peers. The probability that at least one of the zav peers ðkÞ
responds to the query and the susceptible node gets infected is The peers per unit time exiting class PSon total
ðkÞ
thus ð1  ð1  pinf Þzav Þ.

ðkÞ

ðkÞ
The dynamics of the spread of malware in peers with degree k off þ  1  ð1  pinf Þzav PSon
can then be represented in terms of the constituent classes by the ðkÞ ðkÞ ðkÞ
following deterministic system of equations: and those entering number r1 PEon þ r2 PRon þ on PSoff . Combining
ðkÞ
the two gives the rate of change of membership of class PSon as
ðkÞ   given in (5). Equations characterizing the rates of change for the
dPSon ðkÞ  zðkÞ ðkÞ
¼ PSon 1  1  pinf av þ r1 PEon remaining compartments can be derived in a similar fashion. Note
dt
that the transition rates among the various compartments are
ðkÞ ðkÞ ðkÞ
assumed to be known.
þ r2 PRon  off PSon þ on PSoff ð5Þ The model presented above represents an upper bound on the
number of infected nodes. This is because the model neglects the
ðkÞ   correlations in the neighborhoods of nodes that are within
dPEon ðkÞ  zðkÞ ðkÞ T T L hops of each other. Also, since malware sizes are typically
¼ PSon 1  1  pinf av  r1 PEon
dt small (less than a few kilobytes), the download times are expected
to be smaller than the on-off transition times of peers which are of
ðkÞ ðkÞ ðkÞ the order of hours. Thus, the mean-field approximations used in
 PEon  off PEon þ on PEoff ð6Þ
our analysis are acceptable.

ðkÞ
dPIon ðkÞ ðkÞ ðkÞ ðkÞ 4 MODEL ANALYSIS
¼ PEon  PIon  off PIon þ on PIoff ð7Þ
dt
In this section, we analyze the model presented in the previous
section and obtain the expressions governing the global stability of
ðkÞ
dPRon ðkÞ ðkÞ ðkÞ ðkÞ the malware free equilibrium (MFE).
¼ PIon  r2 PRon  off PRon þ on PRoff ð8Þ
dt
4.1 Malware Free Equilibrium
ðkÞ We now proceed with the derivation of the basic reproduction
dPSoff ðkÞ ðkÞ number, R0 , a metric that governs the global stability of the MFE.
¼ off PSon  on PSoff ð9Þ
dt Here, R0 quantifies the number of vulnerable peers whose security
is compromised by an infected host during it’s lifetime. It is an
ðkÞ established result in epidemiology that R0 < 1 ensures that the
dPEoff ðkÞ ðkÞ
¼ off PEon  on PEoff ð10Þ epidemic dies out fast and does not attain an endemic state [18].
dt Stability information of the MFE is important since this guarantees
that the system continues to be malware free even if newly infected
ðkÞ
dPIoff peers are introduced.
ðkÞ ðkÞ
¼ off PIon  on PIoff ð11Þ We follow the methodology presented in [19], [20], where “next
dt generation matrices” have been proposed to derive the basic
reproduction number. In this method, the flow of peers between
ðkÞ
dPRoff ðkÞ ðkÞ
the states are written in the form of two vectors F and V. The
¼ off PRon  on PRoff : ð12Þ ith element of F is the rate of appearance of new infections in
dt
compartment i and the ith element of V is defined as V i ¼ V  þ
i  Vi ,
Note that we have strived to arrive at a generic formulation of þ
where V i is the rate of transfer of peers into compartment i by all
the problem encompassing all possible scenarios. Different flavors other means and V  i is the rate of transfer of peers out of
of the model can be obtained by appropriately choosing the compartment i. These vectors are then differentiated with respect
ðkÞ
parameter values. For instance,  ¼ 1, PEoff ðtÞ ¼ 0, 8 t; k results in to the state variables, evaluated at the malware free equilibrium,
an SIR epidemic model. Also, the offline rates for the various and only the part corresponding to the infected classes are then
classes have been kept same in order to reduce the number of kept to form the matrices F and V , i.e.,
variable and ease of analysis. Different rates for each class can    
easily be accommodated in the model. @F i @V i
F ¼ ðx0 Þ ; V ¼ ðx0 Þ ; 1  i; j  m; ð13Þ
We now describe the rationale behind the equations of the @xj @xj
ðkÞ
model above. A transition out of class PSon occurs if either a peer where F i and V i are the ith entries of F and V, xi is the ith system
goes offline or initiates a search query that is successful. The state variable with x_ i ¼ F i ðxÞ  V i ðxÞ, ðx0 Þ is the malware free
former occurs at rate off while the latter is contingent on the rate  equilibrium and m is the number of infectious states. For
at which requests for file download are generated, multiplied by calculating F and V , the column vectors F and V may be
the probability that the query reaches at least one infected node in considered to consist of m rows, each corresponding to an
620 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

infectious state. In our model, we have m ¼ 4K corresponding to


ðkÞ ðkÞ ðkÞ ðkÞ    
PEon , PEoff , PIon and PIoff . Here K is the largest node degree in the 0 G A 0
network, with K  NP . Ordering the infectious states as F ¼ ;V ¼ ; ð14Þ
0 0 C B
ð1Þ ðKÞ ð1Þ ðKÞ ð1Þ ðKÞ ð1Þ ðKÞ
PEon ; . . . ; PEon , PEoff ; . . . ; PEoff , PIon ; . . . ; PIon , PIoff ; . . . ; PIoff , from
(5-12) we have with 0 representing a 2K  2K zero matrix and
2 ð1Þ
3 2 ð1Þ 3
ð1Þ ð1Þ
PSon ð1  ð1  pinf Þzav Þ zav on 1p1
   z av on Kp1
0  0
6 7 6 zðon þoff Þ zðon þoff Þ 7
6 .. 7 6 .. .. .. .. . . .. 7
6 7 6
6 . 7 6 ðKÞ . . . . . .7 7
6 ðKÞ ðKÞ 7 6 zav on 1pK    zðKÞ
F ¼ 6 Son
6 P ð1  ð1  p Þzav Þ 7 G ¼ 6 zðon þoff Þ av on KpK
0  07 7; ð15Þ
inf 7; 6 zðon þoff Þ
6
6 
7
7 6 0  0 0  07 7
6 0 7 6 .. .. .. .. . . .. 7
6 7 4 . . . . . .5
4 
0 5
0  0 0  0

0
2 3
2 3 r1 þ  þ off  0 0  0
ð1Þ ð1Þ ð1Þ ð1Þ
r1 PEon þ PEon þ off PEon  on PEoff 6 .. .. .. .. .. .. 7
6 7 6 . . . . . . 7
6 .. 7 6 7
6 7 6 7
6 . 7 6 0  r1 þ  þ off 0  0 7
6 7 A¼6
6
7  M;
~
0 7
ðKÞ ðKÞ ðKÞ ðKÞ
6 r1 PEon þ PEon þ off PEon  on PEoff 7 6 0  0 on  7
6 7 6 7
6 7 6 .. .. .. 7
6 ð1Þ
on PEoff  off PEon
ð1Þ
7 4 . . ... ..
.
..
. . 5
6 7
6 7
6 .. 7 0  0 0  on
6 . 7
6 7 ð16Þ
6 ðKÞ
on PEoff  off PEon
ðKÞ 7
6 7
V¼6 7;
6 7
6 PIð1Þ ð1Þ ð1Þ ð1Þ
þ off PIon  on PIoff  PEon 7 2
 þ off  0 0  0
3
6 on 7
6 7 6
6 ... 7 6 .. .. .. .. .. .. 7
6
6
7
7 6 . . . . . . 77
6 P ðKÞ ðKÞ ðKÞ ðKÞ 7 6 7
6 Ion þ off PIon  on PIoff  PEon 7 6 0   þ off 0  0 7
6 7 B¼6
6
7  M;
~ ð17Þ
6
6
ð1Þ
on PIoff  off PIon
ð1Þ 7
7 6 0  0 on  0 7 7
6 7 6 7
6 7 6 .. .. .. .. .. .. 7
6 .. 7 4 . . . . . . 5
4 . 5
ðKÞ ðKÞ 0  0 0    on
on PIoff  off PIon
2 3
with  ðkÞ
0 representing a K-row zero vector. Note that only state Eon   0 0  0
in the set of (5-12) has inflow of new infections and thus only its 6. .. .. .. .. .. 7
6 .. . . . . .7
terms in F have a nonzero entry. Now, at the malware free 6 7
6 7
equilibrium, we have 60   0  07
C¼6
60
7; ð18Þ
6  0 0  07 7
. 6. .. .. .. .. .. 7
6. 7
4. . . . . .5
ðkÞ ðkÞ
ðkÞ ðkÞ ðkÞ ðkÞ
dPSon dPSoff dPEon dPEoff dPIon dPIoff 0  0 0  0
¼ ¼ ¼ ¼ ¼
dt dt dt dt dt dt
ðkÞ ðkÞ 2 3
dPRon dPRoff 0  0 on  0
¼ ¼ ¼0 6 .. .. .. .. .. .. 7
dt dt 6
6 . . . . . . 7
7
6 7
6 0  0 0  on 7
.
ðkÞ ðkÞ ðkÞ
PIon ¼ PIoff ¼ PEon ¼ PEoff ¼ 0
ðkÞ
M¼6
~
6 off
7: ð19Þ
6  0 0  0 7 7
for all 1  k  K. Substituting the above values in (5) and (9), we 6 7
6 .. .. .. .. .. .. 7
ðkÞ ðkÞ 4 . . . . . . 5
get r2 PRon ¼ 0 ) PRon ¼ 0. Again, using this result in (12) yields
ðkÞ
PRoff ¼ 0. Note that the total number of peers with degree k, given 0  off 0  0
ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ
by NP ¼ PSon þ PSoff þ PIon þ PIoff þ PEon þ PEoff þ PRon þ PRoff , is Note that A, B, C, G, and M ~ are all 2K  2Kmatrices. The basic
ðkÞ ðkÞ ðkÞ
a constant. Thus, at the MFE we have NP ¼ PSon þ PSoff , and using reproduction number, R0 , is then the largest absolute eigenvalue
the relation from (9), the peer distribution for degree k at the MFE (spectral radius), ðÞ, of the matrix F V 1 , i.e., R0 ¼ ðF V 1 Þ. Using
ðkÞ ðkÞ elementary matrix algebra and rearranging the terms, it can be easily
evaluates to the vector: fP^Son ; P^Soff ; 0; 0; 0; 0; 0; 0g, where
verified that the product F V 1 can be broken down into GB1 CA1 ,
with the constituent matrices as enumerated above. Thus,
ðkÞ ðkÞ
ðkÞ on NP ðkÞ off NP
P^Son ¼ ; P^Soff ¼ :
on þ off on þ off
R0 ¼ ðGB1 CA1 Þ: ð20Þ
ð1Þ ðKÞ ð1Þ
Differentiating F and V with respect to PEon ; . . . ; PEon , PEoff ; . . . ; 4.2 Illustrative Example
ðKÞ ð1Þ ðKÞ ð1Þ ðKÞ
PEoff , PIon ; . . . ; PIon , PIoff ; . . . ; PIoff and evaluating at the malware We now use a simple scenario to illustrate the effectiveness of the
ðkÞ ðkÞ
free equilibrium fP^Son ; P^Soff ; 0; 0; 0; 0; 0; 0g for all 1  k  K, we model at isolating the impact of system parameters on the
have dynamics of malware propagation. Specifically, we show how
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011 621

ðkÞ
the impact of user behavior on R0 can be evaluated by the model. The quarantined peers comprise a new compartment PQ and
The simplified model makes the following assumptions: when rid of malware, enter the recovered state at rate #. This
introduces the following changes to the system of (5-12):
. Instead of modeling the network by grouping nodes
according to their degree, we use a single compartment ðkÞ ðkÞ
. Additional terms to the classes PIon and PRon reflecting the
(i.e., PSon , PSoff , PEon , PEoff , PIon , PIoff , PRon , and PRoff ) to departure of quarantined peers and addition of recovered
include all nodes, irrespective of their degree. This peers, respectively.
ðkÞ
model is more appropriate for random graph networks, . An additional equation describing the evolution of PQ .
but is used here for illustrative purposes since unlike the Thus (7) and (8) are, respectively, modified to
model presented in Section 3.2, leads to closed form
ðkÞ
solutions that are easier to visualize. Thus in (5-12), we dPIon ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ
drop the superscript ðkÞ and use pinf ¼ PPINon and zðkÞ ¼ PEon  PIon  off PIon þ on PIoff  PIon ð29Þ
av ¼ dt
zav ¼ z1 z2zz1
1
½ðz2 =z1 ÞT T L  1.
. Peers do not spend time in the exposed state, i.e., transition ðkÞ
occurs directly from PSon to PIon . dPRon ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ
¼ PIon  r2 PRon  off PRon þ on PRoff þ #PQ ð30Þ
. Only susceptible peers go offline, i.e., PIoff ¼ PRoff ¼ 0. dt
This essentially reduces the systems of (5-12) to ðkÞ
and the dynamics of PQ are described by
   
dPSon PI zav ðkÞ
¼ r2 PR  PSon 1  1   off PSon þ on PSoff ð21Þ dPQ ðkÞ
dt NP ¼ PI ðkÞ  #PQ : ð31Þ
dt on

    ðkÞ
dPI PI zav The addition of class PQ does not change the equilibrium
¼ PSon 1  1   PI ð22Þ
dt NP distribution of peers at the malware free equilibrium. The only
change is the addition of an extra infectious state, i.e., m ¼ 5K.
dPR ð1Þ ðKÞ ð1Þ ðKÞ
¼ PI  r2 PR ð23Þ Accordingly, ordering the states as PEon ; . . . ; PEon , PEoff ; . . . ; PEoff ,
dt ð1Þ ðKÞ ð1Þ ðKÞ ð1Þ ðKÞ
PIon ; . . . ; PIon , PIoff ; . . . ; PIoff , PQ ; . . . ; PQ , the relevant matrices
for computing R0 are modified as
dPSoff
¼ off PSon  on PSoff : ð24Þ 2 3 2 3
dt 0 G ~
0 A 0 ~
0
Using the methodology described above, the basic reproduction F ¼4 0 0 0 5; V ¼ 4 C
~ B 0 5;
 ~ ð32Þ
T T
number can be calculated as ~
0 ~
0 ^
0 ~
0 D E
T
zav on where ~
0 is a 2K  K zero matrix, ~ 0 is the transpose of ~
0, G, A, C,
R0 ¼ : ð25Þ ~ are given in (15), (16), (18), and (19), and
and M
ðon þ off Þ
2 3
Now, consider the basic reproduction number (say R00 ) for a model  þ  þ off  0 0  0
that neglects online-offline transitions, i.e., a peer is always on and 6 .. .. .. .. .. .. 7
6 . . . . . . 7
in one of the following three states: susceptible, infected or 6 7
6 7
immune. It can be seen that in this case: 6 0   þ  þ off 0  0 7
¼6
B 7  M;
~
6 0  0 on  0 7
6 7
zav 6 7
R00 ¼ : ð26Þ 6
4
.. .. .. .. .. .. 7
 . . . . . . 5
The ratio of (26) and (25) gives us 0  0 0    on
ð33Þ
R00 ðon þ off Þ
¼ : ð27Þ
R0 on 2 3
  0 0  0
Indeed, if one assumes that a peer strictly alternates between 6 . .. .. .. .. .. 7
D¼6
4 .. . . . .
7
. 5; ð34Þ
online and offline behavior, the probability that a peer is online at
any given time can be derived as 0   0  0

on 2 3
pon ¼ : ð28Þ #  0
ðon þ off Þ . .. .. 5
E ¼ 4 .. . . : ð34Þ
Thus, if we assume pon ¼ 0:5, then (27) tells us that models not 0  #
incorporating peer behavior, such as in [9], end up overestimating
the epidemic threshold metric by a factor of two. Note that D is a K  2K matrix and E is a K  K matrix. We then
have R0 ¼ ðF V 1 Þ. Again, to illustrate the impact of quarantining
4.3 Quarantine infected nodes, we work with the simplified scenario introduced in
As a form of damage control, the intensity of malware spread can Section 4.2 and evaluate the dependency of R0 on the quarantine
be limited by quarantining infected nodes. This section quantifies rate. The equations for the model, (21-24) now become
the impact of the quarantine rate on the basic reproduction
   
ratio R0 . Quarantine is introduced in the system as follows: we dPSon PI zav
assume that an infected node is taken off the network with ¼r2 PR  PSon 1  1   off PSon
dt NP ð36Þ
probability . We also assume that this operation does not result þ on PSoff
in the P2P network being split into disconnected components.
622 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

3500
Analysis
Simulation
3000

2500

Infected Peers
2000

1500

1000

500

0
0 20 40 60 80 100
Time

3500
Analysis
Simulation
3000

2500

Infected Peers
2000

1500

1000

500

0
0 20 40 60 80 100
Time

Fig. 2. Influence of offline duration on malware intensity for the system in (5-12).
(a) on ¼ 0:27,  ¼ 1:0. (b) on ¼ 0:75,  ¼ 1:0.
Fig. 1. Impact of  on malware intensity for the system in (5-12).
    peer always responds positively to any query with the malware
dPI PI zav
¼ PSon 1  1   PI  PI ð37Þ cloaked as the file being searched for. Thus, the only way to
dt NP
prevent the node from infecting others is to take it off the network.

dPQ
¼ PI  #PQ ð38Þ 5 RESULTS
dt
In this section, we validate our model using simulations and also
dPR demonstrate its capability to illustrate the effect of various system
¼ PI  r2 PR þ #PQ ð39Þ parameters on malware dynamics. The simulations were con-
dt
ducted using a custom built simulator. Results are reported for a
dPSoff 10,000 node network with a power-law graph topology with
¼ off PSon  on PSoff : ð40Þ  ¼ 3:4. The initial network state for all simulations consisted of
dt 4,950 randomly selected nodes in the susceptible online state,
Now, following the procedure outlined in Section 4.1 5,000 randomly selected nodes in the susceptible offline state, and
" 50 randomly selected nodes in the infected online state. Other
h h iz i #  
PI av ð þ ÞPI parameters that stayed constant in all simulations (unless other-
F ¼ PSon 1  1  NP ;V ¼ ð41Þ wise noted) were on ¼ 0:1, off ¼ 0:2,  ¼ 0:5,  ¼ 0:3, r1 ¼ 0:1,
0 #PQ  PI
r2 ¼ 0:1, and # ¼ 0:1. The results for each parameter setting are
averaged over 20 runs and the 90 percent confidence interval was
   
zav pon 0 ð þ Þ 0 within 10 percent of the mean.
F ¼ and V ¼ Figs. 1a and 1b substantiate our analytical result that requires
0 0  #
  ð42Þ the basic reproduction number to be greater than 1 for an epidemic
1 #zav pon 0 to prevail. We see that if R0 < 1, the number of infected peers
F V 1 ¼ ;
#ð þ Þ 0 0 drops down to zero (Fig. 1a), else it reaches endemic proportions
(Fig. 1b). From (20), we see that R0 is directly proportional to on .
and therefore This implies that nodes staying online for long periods as
zav pon compared to their offline durations result in a higher intensity of
R0 ¼ : ð43Þ malware presence. Simulations concur with the above observation
ð þ Þ and are shown in Figs. 2a and 2b. The analytic model tends to
The malware spread does not reach epidemic proportions overestimate the steady-state number of infected nodes when
provided R0 < 1 and hence, the required rate for quarantining R0 > 1. This is because our model does not take into account the
correlation in the neighborhoods of nodes that are within TTL hops
infected peers need is  > zav pon  . Such a measure takes the
of each other. The sensitivity of malware intensity to on (varied
node off the P2P network and thus it would not be able to from 0.0 to 1.0 in steps of 0.1) is shown in Fig. 3a for  ¼ 0:02 and
participate in any further file transfers until the malware has been the intensity of the epidemic increases monotonically with on .
completely removed. This is indeed necessary since an infected Simulations results have been omitted from Fig. 3a to avoid clutter.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011 623

1000
ACKNOWLEDGMENTS
900 This work was supported in part by the US National Science
800 Foundation (NSF) grant 0347623.
700
Infected Peers

600

500
REFERENCES
[1] Clip2, “The Gnutella Protocol Specification v0.4,” http://www.clip2.com/
400
GnutellaProtocol04.pdf, Mar. 2001.
300 [2] E. Damiani, D. di Vimercati, S. Paraboschi, P. Samarati, and F. Violante, “A
200 Reputation-Based Approach for Choosing Reliable Resources in Peer-to-
100
λon Peer Networks,” Proc. ACM Conf. Computer and Comm. Security (CCS),
pp. 207-216, Nov. 2002.
0
0 20 40 60 80 100 [3] X. Yang and G. de Veciana, “Service Capacity in Peer-to-Peer Networks,”
Time Proc. IEEE INFOCOM ’04, pp. 1-11, Mar. 2004.
[4] D. Qiu and R. Srikant, “Modeling and Performance Analysis of BitTorrent-
Like Peer-to-Peer Networks,” Proc. ACM SIGCOMM, Aug. 2004.
[5] J. Mundinger, R. Weber, and G. Weiss, “Optimal Scheduling of Peer-to-Peer
File Dissemination,” J. Scheduling, vol. 11, pp. 105-120, 2007.
[6] A. Bose and K. Shin, “On Capturing Malware Dynamics in Mobile Power-
60 Law Networks,” Proc. ACM Int’l Conf. Security and Privacy in Comm.
Networks (SecureComm), pp. 1-10, Sept. 2008.
50
[7] L. Zhou, L. Zhang, F. McSherry, N. Immorlica, M. Costa, and S. Chien, “A
First Look at Peer-to-Peer Worms: Threats and Defenses,” Int’l Workshop
η Peer-To-Peer Systems, Feb. 2005.
Infected Peers

40
[8] F. Wang, Y. Dong, J. Song, and J. Gu, “On the Performance of Passive
30
Worms over Unstructured P2P Networks,” Proc. Int’l Conf. Intelligent
Networks and Intelligent Systems (ICINIS), pp. 164-167, Nov. 2009.
20
[9] R. Thommes and M. Coates, “Epidemiological Models of Peer-to-Peer
Viruses and Pollution,” Proc. IEEE INFOCOM ’06, Apr. 2006.
10
[10] J. Schafer and K. Malinka, “Security in Peer-to-Peer Networks: Empiric
Model of File Diffusion in BitTorrent,” Proc. IEEE Int’l Conf. Internet
Monitoring and Protection (ICIMP ’09), pp. 39-44, May 2009.
0
0 20 40 60 80 100 [11] J. Luo, B. Xiao, G. Liu, Q. Xiao, and S. Zhou, “Modeling and Analysis of
Time
Self-Stopping BT Worms Using Dynamic Hit List in P2P Networks,” Proc.
IEEE Int’l Symp. Parallel and Distributed Processing (IPDPS ’09), May 2009.
[12] W. Yu, S. Chellappan, X. Wang, and D. Xuan, “Peer-to-Peer System-Based
Fig. 3. Simulation results for the system in (5-12) (top) and (29-31) (bottom). Active Worm Attacks: Modeling, Analysis and Defense,” Computer Comm.,
(a) Effect of on on malware intensity. (b) Effect of quarantine on malware intensity. vol. 31, no. 17, pp. 4005-4017, Nov. 2008.
[13] A. Ganesh, L. Massoulie, and D. Towsley, “The Effect of Network Topology
on the Spread of Epidemics,” Proc. IEEE INFOCOM, 2005.
100 [14] Y. Wang, D. Chakrabarti, C. Wang, and C. Faloutsos, “Epidemic Spreading
Analysis, η=0.0 in Real Networks: An Eigenvalue Viewpoint,” Proc. IEEE Int’l Symp. Reliable
90 Simulation, η=0.0
Analysis, η=1.0
Distributed Systems (SRDS), 2003.
80 Simulation, η=1.0 [15] M. Newman, S. Strogatz, and D. Watts, “Random Graphs with Arbitrary
70
Degree Distribution and Their Applications,” Physical Rev. E, vol. 64, no. 2,
pp. 026118(1-17), July 2001.
Infected Peers

60
[16] D. Stutzbach and R. Rejaie, “Characterizing the Two-Tier Gnutella
50 Topology,” Proc. ACM Int’l Conf. Measurement and Modeling of Computer
Systems (SIGMETRICS), pp. 402-403, June 2005.
40
[17] R. Pastor-Satorras and A. Vespignani, “Epidemic Dynamics in Scale-Free
30 Networks,” Physical Rev. E, vol. 65, no. 3, p. 035108(1-4), Mar. 2002.
20 [18] O. Diekmann and J. Heesterbeek, Mathematical Epidemiology of Infectious
Diseases: Model Building, Analysis and Interpretation. Wiley, 1999.
10
[19] P. van den Driessche and J. Watmough, “Reproduction Numbers and Sub-
0 Threshold Endemic Equilibria for Compartmental Models of Disease
0 20 40 60 80 100
Time Transmission,” Math. Biosciences, vol. 180, pp. 29-48, 2002.
[20] J. Arnio, J. Davis, D. Hartley, R. Jordan, J. Miller, and P. van den Driessche,
“A Multi-Species Epidemic Model with Spatial Dynamics,” Math. Medicine
Fig. 4. Effect of quarantine on the system in (29-31) for  ¼ 0:02. and Biology, vol. 22, pp. 129-142, Mar. 2005.

The effectiveness of quarantine in controlling the spread of


malware is shown in Fig. 4 which shows the infected population in . For more information on this or any other computing topic, please visit our
the network with and without quarantine. Also, (43) depicts an Digital Library at www.computer.org/publications/dlib.
inverse relationship between R0 and the quarantine rate .
Analytic results for increasing values of  (from 0.0 to 1.0 in steps
of 0.1) for  ¼ 0:02 for the power-law topology are presented in
Fig. 3b which shows that the malware intensity is inversely
proportional to .

6 CONCLUSION
In this paper, we developed an analytic model to understand the
dynamics of malware spread in P2P networks. The need for an
analytic framework incorporating user characteristics (e.g., offline
to online transitional behavior) and communication patterns (e.g.,
the average neighborhood size) was put forth by quantifying
their influence on the basic reproduction ratio. It was shown that
models that do not incorporate the above features run the risk of
grossly overestimating R0 and thus falsely report the presence of
an epidemic.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Вам также может понравиться