Вы находитесь на странице: 1из 5

2010 International Conference on Advances in Social Networks Analysis and Mining

Virus Propagation Modeling in Facebook


W. Fan and K. H. Yeung
download an infected file is the function of the ratio of infected files to total files. And this ratio can be affected by users downloading and executing behavior. However, the models of virus propagation in email or IM networks are not suitable for that in SNS networks. Besides sending messages as using email, users of SNS networks can upload files to their accounts. The activities that a user takes will appear in his/her friends news feed, so all the friends can read the news when they are online. And different from email or IM network, some people use SNS for entertainment, so some of them spend hours on that every day. These are helpful for virus propagation. As the behavior of SNS users can be more complex than that in other networks, it is necessary to construct a new model for virus propagation in SNS network. Recently, there has been some news to report that some SNS websites, such as Facebook and MySpace, were attacked by hackers [4]. In this paper we choose Facebook to analyze its characteristics to model the virus propagation in it. In this paper, we propose two models of virus propagation in Facebook. One is based on the Facebook application platform. Hackers may utilize this platform to post applications along with viruses. As will be reported later, malicious application spreads through network faster than the normal applications with the same initial conditions. Another model is sending messages to friends. It is a traditional way, just like sending email with malicious attachments. Actually, hackers can post pictures or links that contain Trojans. For simplicity, we assume that these methods are similar and describe them as sending messages. As will be reported later from our simulation results, we find that as some people take Facebook as entertainment tool, virus will therefore spread faster. II. FACEBOOK USER NETWORK TOPOLOGY We present the users of Facebook as a network with N nodes in this paper. Users are nodes in the network and each node is assigned a number i, i = 1,2,...N . An edge between

AbstractOnline social network services have attracted more and more users in recent years. So the security in social networks becomes a critical problem. In this paper, we propose a virus model based on the application network of Facebook, which is the most popular among these social network service providers. We also model the virus propagation with an email virus model and compare the behaviors of virus spreading in Facebook and email network. We find that while Facebook provides a platform for application developers, it also provides the same chance for virus spreading. And virus will spread faster in Facebook network if users of Facebook spend more time on it for entertainment.

I. INTRODUCTION OMPUTER viruses can spread through the Internet in many ways, such as email, instant messengers (IM), and P2P file sharing. Currently, as online social network service (SNS) becomes popular, the users that use this service turn into the targets of virus writers. People can communicate and share files with their friends on these websites, and they can also take part in some activities or join a group online. These characteristics give hackers opportunity to attack these users. The virus spreading in SNS looks like that in an email network or an instant messenger network. All of them can spread virus by sending or sharing files which contain malicious codes. If a user of these networks gets infected, the infected account will automatically send the same email or file to the contact list in this users accounts, so the virus can spread quickly. There have been some models to simulate the virus propagation in those networks. C. Zou et al. describe their email virus propagation model in [1]. This model accounts for users email checking time intervals and the probability of users to open these attachments. And they found that as users email checking time becomes more variable, the virus spreads faster. In [2] T. Komninos et al. propose a worm propagation model of email, IM and P2P networks. They consider that as time grows, users behaviors should change. So the probability that a user opens an attachment is not a certain value, but will decrease as time goes. The model of virus propagation in P2P network in [3] assumes that the probability of a user to
Manuscript received January 27, 2010. (Write the date on which you submitted your paper for review.) This work was supported by Hong Kong Government General Research Fund (grant No. CityU-123608). W. Fan is with the Department of Electronic Engineering, City University of Hong Kong (phone: 852-9279-6694; e-mail: weifan2@ student.cityu.edu.hk). K. H. Yeung is with the Department of Electronic Engineering, City University of Hong Kong, (e-mail: eeayeung@cityu.edu.hk).

i and j means these two users are friends. In Facebook, user i becomes a friend of user j with their
two nodes names in each others friends list, so this network is undirected. We also define that the node degree is the number of friends a user has. Some results have showed that email networks have scale-free topology [5]. And recently, some researchers have studied the structure of online social networks topology, such as MySpace, orkut, and so on [6][7]. They found that these networks all have power law degree distributions. Their degree distribution can be described

978-0-7695-4138-9/10 $26.00 2010 IEEE DOI 10.1109/ASONAM.2010.22

331

as P ( k ) ~

k , here the probability that a node connects to k nodes is P (k ) , and > 0 . Most nodes of these networks

per application is not a fixed value. So the number of installations of application k at time step t is Installk (t ), k

have small degree but a few nodes have many connections. This structure has been demonstrated to be vulnerable and the well connected nodes are crucial in epidemic spreading [8]. In our simulations, we assume that the nodes degree of Facebook user network exhibits the power law distribution, as it is also one of the online social networks. And we construct the network with Barabasi-Albert scale-free network model [9]. In this model, nodes are added to the network continuously, and = 3 . Edges are added with a preferential attachment, so the nodes with greater degrees will get more connections. In the following simulations, both of our models are studied based on the BA scale-free network. III. A MODEL BASED ON FACEBOOK APPLICATION PLATFORM One of the Facebooks successes is its application platform. By using this platform companies and individuals can develop third-party applications. Users of Facebook can add these applications onto their accounts. More than 95% of users have used at least one application built on Facebook Platform, and everyday there are 140 new applications added [10]. However, it has been reported that some new applications become available along with virus [4]. Unlike the spreading of normal applications, if a user installs this kind of application, his/her account is infected and fake messages are forwarded to all his/her friends to persuade them to install the same application, this will increase the probability that a user installs it. Considering the great number of daily installations, it is necessary to construct a virus propagation model based on these third-party applications. It has been shown that the installation of applications has a preferential characteristic [11]. That means a user who has installed more applications has a higher probability to install new applications. The distribution of number of installations of applications is shown in Fig. 1. The data is obtained from Adonomics [12]. Gjoka developed a model to simulate the user coverage of Facebook applications [11]. With the input of the list of applications, number of installations per application and number of users, this model can generate a graph which shows the power-law distribution of user coverage. This model is helpful for us to model the existing coverage, but it can not reflect the behavior when a user encounters an application invitation. In this paper we proposed a Facebook virus propagation model based on the third-party application, which not only follows the spreading law of normal applications, but also contains the characteristic of virus spreading. Our Facebook user network has N user nodes, and each node is assigned a number i, i donated by k , k

= 1,2,...N app . From the data of Adonomics

we can get the knowledge of initial number of installations per application before the virus begins attacking at t = t0 . So we can create a list of applications and assign each application. The distribution of

Installk (t0 ) for

Installk (t0 ) is similar

to the curve in Fig.1. Next we can model the behaviors of applications as described below.

Fig. 1. The distribution of number of installations of each application.

A. Construct the initial user coverage: With the list of existing applications and Install k (t0 ), k = 1,2,.... N app , we can construct the initial user coverage using the model in [11]. In the beginning, we have Install k (t 0 ) , but all the users in our network have not installed any application. Then at every step, each installation of Installk (t0 ) of all the N app applications is assigned with a probability to one of the users. The probability of one installation to be installed by user i is

Puser (i, t ) =

Appsi (t ) + inituser
N user j =1

( Apps (t )
j

(1)

+ inituser )
i

, here Appsi (t ) is the number of applications that user has installed at time step t . The parameter effect of preferential installation.

reflects

the

inituser is used to show the

initial probability P (i, t ) of a user who does not install any user application. That is, if Appsi (t ) = 0 , the initial probability of user i is

inituser
N user j =1

= 1,2,...N user . We assume that the number of available applications is N app and each one is
= 1,2,...N app . Each application has a number

( Apps (t )
j

. This step is repeated

+ inituser )

until all

Installk (t0 ) installations of all N app applications

are exhausted. So the initialization is completed.

of installations to show how many users have installed it. As there is new installations everyday, the number of installations
332

B. Virus propagation follows the steps below: (a). Select the users who are infected in the beginning: The virus spreading starts from I 0 infected users at time step t0 .

I 0 infected users from the network. Now the total number of applications N app is added by 1 and the
Randomly pick the order number of the malicious application is N app . And I (t0 ) =

I 0 , here I (t ) is the number of infected users at


Fig. 2. The behavior of three applications over time. In this simulation, N = 50000, N app = 100 before the virus spreads, and I 0 = 10 .

t th time step. They will send invitation messages to their friends. (b). Maintain the installation distribution: The statistics of Facebook shows that there are many new installations everyday, but the curve showed in Fig.1 does not change significantly. The distributions of installation of applications are similar from time to time. So we should maintain the preferential characteristic of installations. From Adonomics we know that there are millions of installations are taken everyday. Due to the small size of our network, we scale down the number of installations each day. We assume that the number of new installation per day is m . We select one application from the application list, the probability of application k is selected is

Papp (k , t ) =

Installk (t ) + initapp
N app j =1

(2)

( Install (t ) + init
j

app

In this simulation, we record I (t ) at every time step t to show the virus spreading process in the network and the size of infected users in the end. In Fig.2 we plot the behavior of malicious application, as the solid line shows. The dash line shows the behavior of the application which attracts the most users. In this figure, the number of installations of the malicious application increases rapidly, and comes close to that of the top application. That is because the step (c) helps the virus spread. But after the number of infected users reaches a certain value, its growth mainly comes from the step (b). So we can see from the figure that the growth rate is as the same as that of the top application after t = 50 . And we also record the behavior of the application which has the same number of installations at

. Here

initapp defines the initial probability Papp (k , t ) of an

t = t0 = 1 . We find that its

application without any installation. Then this selected application is installed by a user i with P (i, t ) . If user i has user installed this application, pick other users. We select application and assign it to a user for m times in this step. So we have m new installations. And if the malicious application is installed, change the value of I (t ) , and the infected user sends invitations to his/her friends. (c). Users deal with the invitations: Every user who has received c invitation at this time step will install the malicious application with the probability

installations do not increase much. This implies that the number of installations of the application which has the same condition in the beginning does not change obviously. Behaviors of users can affect their friends decision, and in BA scale-free network we can change the average users degree, which is the average number of its friends. Fig.3 plots the installation of malicious application in different network with k = 6 and k = 14 . Here

k is the average degree of

the network. This figure shows that a network with greater

k has more infected users, and the virus can spread faster in
it.

Pvirus =

Install N app (t ) Appsi (t ) 1 N user N app

I (t ) will be

changed if necessary. From the behavior of the virus, it is reasonable that if user receives more invitation messages, he/she has higher risk to be infected. (d). Repeat step b and c for the next time step t .

Fig. 3. The behavior of malicious application in networks with different average degrees.

In our model, infected users send invitation to their friends.


333

This affects the behavior of users significantly, as we see from the result. Users who are friends in Facebook may also friends in real life, so they will easily trust the invitations they received. So the virus can spread rapidly even there are only a few users install it in the beginning. IV. A MODEL BASED ON SENDING MESSAGE Another model is similar to email virus propagation. Email virus spreads by sending mails to users which contain attachments. If users open this email attachment, they will be infected. But in Facebook users cant add attachments to messages, so hackers try to lead users to third-party websites which urge users to download the virus [8]. The spreading process is like that: users log in Facebook from time to time. When a user is online and receives a message with a link to malicious website, he/she may delete this message without click this link or click it and then is infected. If a user is infected, the virus will send the same message to all the friends of this user. We can see that the process seems to be the same with the email virus [1]. Both of them depend on users interaction. And users check their accounts with dynamic time intervals. However, they have differences. In normal cases, while using email application, people only check that if there is new mail and then log out. But people spend more time on Facebook. Every day more than 3 billion minutes are spent on Facebook which has 200 million active users [10]. That means average user spends more that ten minutes on it per day. If Facebook users get new messages when they are online, they can check mailbox immediately. For the ones who are online for hours every day, virus can spread faster. As a result, besides the time between two log-in attempts of a user and the probability that a user clicks the direct malicious URL, we need the online time as another factor that affects virus propagation. We still present our model as a BA scale-free network with N nodes. Users of Facebook are described as i, i = 1,2,...N . In our model nodes have three kinds of statussusceptible, intermediate, and infected. In the beginning, all nodes are susceptible. If a node gets a message with malicious link in inbox but it is offline, this node becomes intermediate. An intermediate node can back to susceptible if it ignores this message, or become infected with a click probability. The three users interaction factors as follows: (1) Facebook log-in time Tlog in (i ) of node i, i = 1,2,...N , follows exponential distribution. Its mean E[Tlog in (i )] is independent that E[Tlog in ] ~ Gaussian
2 N ( Tl , Tl ) .

(3) The probability that user i clicks a malicious link

Pclick (i, t ) is also independent Gaussian random variable. We


assume Pclick (t ) ~
2 N ( p (t ), p ) ,

in

which

p (t ) is

monotonically decreasing with time, because more users are aware of this scam as the number of infected users increases.

N inf ect (t ) ) , here 0 is a constant N and N inf ect (t ) is the number of infected user at time step t . In
We have P (t ) =

0 (1

our simulation 0

= 0.5, 2 = 0.09 . p

Virus propagation follows the steps below: (a) The propagation begins at t = 1 . We randomly select

N inf ect (0) users who have malicious mails in their mailboxes
in the beginning. These mails contain malicious links. If a user click the link, he/she will be infected and send the same mails to his/her friends. (b) At every time step, we check all the users who logs in Facebook or is online at this step. If they have the malicious mails in their mailbox, they will click the links with Pclick (i ) . After a user i logs off, we will generate new Tlog in (i ) and

Tonline (i ) for this users next log-in. All our simulations are
under the non-reinfection case, that is, infected users will not send the virus to their friends if they click the malicious link again. (c) Repeat step (b) for the next time step t . A. Comparison of Facebook network and email network

Fig. 4.

The behavior of number of infected user. In this simulation, and

N = 10000 , N inf ect (0) = 10 ,


averages over 20 simulations.

p (t ) 0 .

The data is

In this simulation, we record the number of infected users

random In our

variable simulation

N inf ect (t ) at each time step. Fig.4 plots the N inf ect (t ) of
Facebook network with a constant

E[Tlog in ] ~ N (40,400) .
(2) Facebook online time Tonline (i ) of node i, i = 1,2,...N , its mean is independent Gaussian random variable
2 that Tonline ~ N ( To , To ) .

p (t ) 0 ,

as the solid

It

is

obvious

that

line shows. The dash line is the number of infected users in email network, which does not consider the user online time. By comparing these two lines, we find that the virus will spread faster as some users spend lots of time on Facebook.

Tonline (i ) < Tlog in (i ) . In our simulation Tonline ~ N (1,100) .


334

V. CONCLUSION In this paper, we proposed two models for virus propagation in Facebook network. In the model based on the Facebook application platform, if the virus is installed by more users, it will become more popular and attract more installations. The result of the simulation shows that, even the malicious application attracts only a few users in the beginning, it can still spread rapidly. That is because users may trust their friends of Facebook and install the malicious application. While in the second model, which is similar to the email virus propagation, the probability that a user is infected becomes smaller as more and more users get infected. We found that the virus spread faster in Facebook than in email network, as people use Facebook for entertainment and spend more time on it. And as the behavior of users can affect their friends, the virus will spread faster if the users have more friends in this network. REFERENCES
[1] C. C. Zou, D. Towsley, and W. Gong, Email virus propagation modeling and analysis, Umass ECE Dept., Tech. Rep. TR-03-CSE-04, May 2003. [2] T.Komninos, Y.C. Stamatiou and G.Vavitsas, ``A worm propagation model based on people's email acquaintance profiles,'' in Wine 2006, Patras, Greece. [3] R.W. Thommes, M.J. Coates, Modeling Virus Propagation in Peer to Peer Networks, Information, Communications and Signal Processing, 2005 Fifth International Conference: 981-985. [4] http://www.kaspersky.com/news?id=207575670 [5] H. Ebel, L.-I. Mielsch, and S. Bornholdt. Scale-free topology of e-mail networks, Physical Review E, vol. 66, 035103 (R), 2002. [6] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and S. Bhattacharjee. Measurement and Analysis of Online Social Networks. In IMC 07: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, 2007. [7] R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks". In KDD 06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006. [8] R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scale-free networks. Physical Review Letters, vol. 86(14), pp. 32003203, April 2, 2001. [9] A. L. Barabasi and R. A. Albert, Emergence of scaling in random networks, Science, vol. 286, 1999, pp. 509-512. [10] http://www.facebook.com/press/info.php?statistics#/press/info.php?stati stics [11] M. Gjoka, M. Sirivianos, A. Markopoulou, and X. W. Yang, Poking Facebook: Characterization of OSN Applications, ACM SIGCOMM Workshop on Social Networks (WOSN08), August 2008. [12] Adonomics. http://www.adonomics.com, 2009.

Fig. 5.

The behavior of number of infected user. In this simulation, and

N = 10000 , N inf ect (0) = 10 ,

P (t ) = 0 (1

N inf ect (t ) . ) N

The data is averages over 20 simulations.

But users may be aware of this virus as more and more users are infected. So the probability that they click the links would decrease, and the behavior of number of infected users will be different. We plot it in Fig.5, in which the

p (t )

changes over time. We find that the sizes of infected users are smaller in both Facebook and email networks than those in Fig.4. In Fig.5, the solid line still rises faster than the dash line. Our result indicates that with the same conditions, virus spreads faster in Facebook network than that in the original email network. But if we assume that the probability of a user to click the malicious link is not constant, the size of infected users will be smaller and the spreading rate is slower. B. Comparison of Facebook networks with different user average degree

Fig. 6. The virus behavior in networks with different average degrees. In this simulation N = 10000, N inf ect (0) = 10 , and

P (t ) = 0 (1

We also compare the virus behaviors in networks with different average degree. Fig.6 plots the virus propagation process in networks with k = 6 and k = 14 . We also get the same conclusion that a network with greater infected users.

Ninf ect (t ) . The data is averages over 20 simulations. ) N

k has more

335

Вам также может понравиться