Вы находитесь на странице: 1из 48

c

c c c c c  c

WE HEREBY DECLARE THAT THIS PROJECT WAS TRULY UNDERTAKEN BY US

UNDER SUPERVISION AND HAS NOT IN PART OR WHOLE BEEN PRESENTED FOR

ANOTHER PROJECT.

GYAMFI ATTA KWAME (1551907) .«««««.. «««««..

NAME AND INDEX NUMBER SIGNATURE DATE

ABORABORAH NATHANIEL (1545407) «.««««.. «««««..

NAME AND INDEX NUMBER SIGNATURE DATE

BOAHEN FRANK (1549607) «««.««.. «««««..

NAME AND INDEX NUMBER SIGNATURE DATE

YERIBATUAH PETER (1558007) «««.««.. ««««««

NAME AND INDEX NUMBER SIGNATURE DATE

CERTIFIED BY: MR. K. F. DARKWAH «««««« ««««««

SUPERVISOR SIGNATURE DATE

CERTIFIED BY: DR. S.K. AMPONSAH «««««« ««««««

HEAD OF DEPARTMENT SIGNATURE DATE

c
c
c
c


    c

We thank Mr. K. F. Darkwah for his diligence, creativity and invaluable efforts in the

completion of this project, Student computer users of KNUST and our families.

c
c
c
c

  c

We dedicate this project to our families especially Mom and Dad.

c
c
c
c

c

Differential Equations is a very effective mathematical tool and has proven to be very useful

research tool in various real-life problems involving dynamical systems such as the spread of

computer virus, diseases, population growth in aqua-culture, etc. This project seeks to investigate

the spread of computer viruses. This was made possible using the SIS Model, differential

equations and regression analysis.

The SIS Model shows that once a susceptible computer becomes infected with a computer virus,

it becomes susceptible again on recovery. Thus no recovered computer system can be granted

immunity from infection. Data was taken from student computer users on KNUST campus.

The model yielded a result showing that the number of susceptible computers increases as the

number of infected computers decreases until a point where they all become fairly stable. The

reproduction ratio, R0 <1 implies that in future there might not be an outbreak of computer virus

on KNUST campus.

It is our belief that the research findings as well as the recommendations made in this study if

adapted will go a long way to reduce the spread of computer virus on KNUST campus.

c
c

c
c
c
c

cc  c
 c cc


     cc
  c cc
c cc
cc  c cc
ccc cc
cc c cc

c c cc
INTRODUCTION ............................................................................................................................................. 1c
’ ’ c 
 
 c cc
’ ’ ’c  c cc
’ ’ c  cc
  c 
c cc
1.2.0c TYPES OF COMPUTER VIRUSES ............................................................................................................... 3c
’  ’c 
   c cc
’  c    
 c c c
’  c 
 c c c
’  c 

! 
 c c
’  "c # $
 c c c
’  %c & 
 c c c
’  'c ! 
 c c c
’  (c 
)
$
*+c c c
1.3.0c BACKGROUND OF STUDY ........................................................................................................................ 6c
’  ’c , 
-$

! 
 c c c
’  c  

$
$
. 

! 

/
! c cc
1.4.0c PROBLEM STATEMENT............................................................................................................................ 8c
1.5.0c OBJECTIVES ........................................................................................................................................... 9c
1.6.0c METHODOLOGY ..................................................................................................................................... 9c
’ % ’c $c cc
’ % c 0 
1 !2

$
$

 c cc
’ % c  
. $c cc
1.7.0c JUSTIFICATION ..................................................................................................................................... 10c
1.8.0c ORGANIZATION OF PROJECT ................................................................................................................. 10c
1.9.0c SUMMARY OF CHAPTER ........................................................................................................................ 11c
c c cc
LITERATURE REVIEW ................................................................................................................................. 12c
2.0.0c INTRODUCTION .................................................................................................................................... 12c
2.1.0c COMPUTER VIRUS EPIDEMIOLOGY ........................................................................................................ 13c
2.2.0c COMPUTER VIRUS PREVALENCE ........................................................................................................... 15c
cc cc

c
c
c
c
METHODOLOGY .......................................................................................................................................... 16c
3.0.0 INTRODUCTION ..................................................................................................................................... 16c
3.1.0 DEFINITION OF VARIABLES AND PARAMETERS ....................................................................................... 16c
3.2.0 SOME BASIC ASSUMPTIONS ABOUT THE SIS MODEL ............................................................................... 17c
3.3.0 FORMULATION OF THE MODEL ............................................................................................................... 17c
3.4.0 MODEL DESCRIPTION ............................................................................................................................ 18c
  ’

 
) 
3
!  c cc
  

#
 2
4 cc
  

13 3$
 c cc
3.5.0 E XACT SOLUTION ................................................................................................................................. 19c
3.6.0 ESTIMATION OF PARAMETERS ............................................................................................................... 21c
 % ’c 5
$
4
  c cc

cc cc
DATA COLLECTION, ANALYSIS AND RESULT ........................................................................................ 24c
4.0.0c INTRODUCTION .................................................................................................................................... 24c
4.1.0c SOURCE OF DATA ................................................................................................................................. 24c
4.2.0c ESTIMATION OF PARAMETERS (Ȁ, Ǻ), AND REPRODUCTIVE RATIO (R0) ................................................... 24c
4.3.0c RESULTS .............................................................................................................................................. 30c
4.4.0c QUALITATIVE ANALYSIS OF THE SIS MODEL......................................................................................... 31c
  ’c 67)
 c cc
  c 6
8$
6  c cc
  c 0$  c cc
4.5.0c DISCUSSIONS ....................................................................................................................................... 34c
c cc
CONCLUSION AND RECOMMENDATIONS ............................................................................................... 36c
5.0.0c CONCLUSION ....................................................................................................................................... 36c
5.1.0c RECOMMENDATIONS ............................................................................................................................ 36c
REFERENCES ................................................................................................................................................ 37c
APPENDIX B: ................................................................................................................................................. 40c
c

ccc
c

Table 1: ANOVA TABLE ........................................................................................................ 22c

c
c
c
c

Table 2: Regression Data. ......................................................................................................... 26c


Table 3: Regression Statistics. ................................................................................................... 28c
c

c c

 c
c
c
c

cc c
Figure 1: Scatter Plot of the Regression Data. ........................................................................... 27c
Figure 2: F-Distribution. ........................................................................................................... 29c
Figure 3: SIS Model Graph. ...................................................................................................... 31c
Figure 4: Phase Plane. ............................................................................................................... 33c
c

 c
c
c
c

c c

INTRODUCTION

The term computer virus is often misunderstood. Many users who do understand it may not

understand protection in computer systems. Since it is the sole intension of authors of computer

virus to gain unauthorized access to relevant information on target computer system, users are

required to install anti-malicious software programs. This intrusion is usually referred to as a

computer virus attack. To ensure protection from such attacks, the installed anti-malicious

software program needs to be updated to regularly monitor malicious intrusions from

questionable sources. Although Trojan horse, worms and viruses are not the same, the term

computer virus is generally used when describing these malicious code or program that can

damage a computer system.

1.1.0 Malicious Software Programs

Malicious software program also know as malware is a code of software that is written with an

intention to crash a computer and make it malfunction. When executed, the software allows the

authors of the software to gain access to valuable information on compromised computer. One of

the dangers of the malicious software program is that after it is been downloaded unto a

computer system, the user might not be aware of the malicious software on the system. They

enter a computer system through attachments in e-mail messages, attachments of images,

screensavers and greeting cards, downloaded audio and video files from the internet. Usually a

malicious software program with the ability to self-replicate its code is branded as a computer

virus, Trojan horse or worm.

c
c
c
c

1.1.1 Virus

Cohen (1985) originally defined a µcomputer virus¶ to be a program that can infect other

programs by modifying themselves to include a possibly evolved copy of it. Today computer

virus comes onto the minds of users when a malware is under discussion. A technical definition

of a computer virus is given as a sequence of instructions that copies itself into other programs in

such a way that executing the program also executes that sequence of instructions. Some viruses

do little but replicate others and can cause severe harm or adversely affect program and

performance of a computer system.

1.1.2 Worms

A worm is a program very similar to a virus in design. It has the ability to self-replicate even

without human action and can lead to negative effects on a system. It takes advantage of File

Transport features on a system, which is what allows it to travel to other parts of a computer

system unaided. This makes a worm capable of travelling across a network. The end result in

most cases is that the worm consumes too much system memory (or network bandwidth),

causing Web servers, network servers and individual computers to stop responding. Examples of

worms include: PSWBugbear.B, Lovgate.F, Trile.C, Sobig.D, Mapson.

1.1.3 Trojans

Another unsavory breed of malicious code are Trojans or Trojan horses, which unlike viruses do

not reproduce by infecting other files, nor do they self-replicate like worms. A Trojan horse at

first glance will appear to be useful software but will actually damage a system once installed or

run on a computer system. When a Trojan is activated on a computer system, the results may

c
c
c
c

vary. Some Trojans are designed to be more annoying than malicious (like changing your

desktop, adding silly active desktop icons) or they can cause serious damage by deleting files and

destroying information on your system. Trojans are also known to create a backdoor on your

computer that gives malicious users access to your system, possibly allowing confidential or

personal information to be compromised.

1.2.0c Types of Computer Viruses

Several studies on computer viruses at the microscopic level have enabled researchers to group

and classify virus into types. They are mainly grouped according to their mechanism and target

of infection and according to the mode of spread.

Depending on their targets of attacks, computer viruses may be classified into some three main

types, where the targets are electronic object that can be infected by the virus. Eg. files, program

and application, file sharing media, disk etc.

1.2.1 File Infectors

File infector virus infects programs or executable files with .exe or .com extensions. Priority is

given to the execution of the virus code when a user runs an infected application and the installs

itself independently in the computer¶s memory. This allows the virus to copy its replicas into

subsequent applications that the user runs. Like other types of viruses user may be unaware of its

existence. Resident virus, directory virus, direct action virus, companion virus, polymorphic

virus, etc are some examples of file infectors.

c
c
c
c

1.2.2 Boot-Sector Virus

As its name implies, boot-sector virus infects and resides in a special part of a diskette or hard

disk that is read into memory and executed when a computer first starts. Once loaded, a boot-

sector virus can infect any diskette that is placed in the drive. In the 90¶s Boot viruses

outnumbered file infectors due to the prevalent use of zip and floppy disk to start a computer.

Nonetheless, it is important to protect your system against the infection of boot-sector virus since

it accounts for about 5% of known viruses today. Abdelazim and Wahba (2010). The only

known remedy when infected is to reformatting the whole computer system. Direct action

viruses come under this classification. Other examples also include polyboot.b, antiexe,

Michelangelo, FAT virus, etc.

1.2.3 Macro Virus

It is a type of virus that infects files that are created by certain applications which contain

macros. Macro viruses are independent of operating systems and infect files that are usually

regarded as data rather than as programs. Many spreadsheet, database and word-processing

programs can execute scripts (prescribed sequences of actions) embedded in a document. Such

scripts, or macros, are used to automate actions ranging from typing long words to carrying out

complicated sequences of calculation. Macro virus takes the advantage of this kind of execution

to infect other data files. Examples: Relax, Melissa.A, Bablas, O97M/Y2K.

1.2.4 Cavity or Spacefiller Virus

This virus attempts to install itself in an empty space without damaging any program. An

advantage of this is that the virus then does not increase the length of the program and can avoid

the need for some stealth techniques. The Lehigh virus was an early example of a cavity virus.

c
c
c
c

Because of the difficulty of writing this type of virus and the limited number of possible hosts,

cavity viruses are rare. Daoud, Jebril and Zaqaibeh 


 (2008)

1.2.5 Resident Virus

This type of virus dwells in the Read Access Memory (RAM) of a computer. From there it can

overcome and interrupt all of the operations executed by the system: corrupting files and

programs that are opened, closed, copied, renamed etc. Examples of resident virus are CMJ,

MrKlunky and Meve.

1.2.6 Overwrite Virus

This type of virus overwrites files with their own copy. Overwrite virus mainly deletes the

contents of an infected file. They spread quickly through e-mail and Internet Relay Chat (IRC)

popularly referred to as chat rooms. This may be a very primitive technique, but it is certainly the

easiest approach of all. Daoud, Jebril and Zaqaibeh et al (2008). Overwriting viruses cannot be

disinfected from a system. The only way to clean a file infected by an overwrite virus is to delete

the file completely from disk, thus losing the original content. Examples of this virus include:

Way, Trj.Reboot, Trivial.88.D.

1.2.7 Compressing Virus

A special virus infection technique uses the approach of compressing the content of the host

program. Sometimes this technique is used to hide the host program's size increase after the

infection by packing the host program sufficiently with a binary packing algorithm. This type of

virus employs encryption during execution and hence is classified as an encrypted virus. Daoud,

Jebril and Zaqaibeh, (2008).

c
c
c
c

1.2.8 Malicious Mobile Code (MMC)

Mobile code is a lightweight program that is downloaded from a remote system and executed

locally with minimal or no user intervention. Java applets, JavaScript scripts, Visual Basic

Scripts (VB Scripts), ActiveX controls, Microsoft remote control administrator, etc are some of

the most popular examples of mobile code that you may encounter while browsing the Web or

reading HTML-formatted e-mail. Daoud, Jebril and Zaqaibeh (2008).

1.3.0 Background of Study

1.3.1 Historic Background of Computer Virus

There are so many opinions about the history of the first computer virus produced. But this does

not permit us to conclude that there does not exist any documented history of computer virus.

The term computer virus was coined by an American electrical engineer and computer scientist

Fred Cohen in 1985. He came out with this term after describing a self-replicating computer

program that he designed to enable him acquire some privileges on a VAX-11/750 running

UNIX. However, before Cohen had accomplished this, several self-replicating programs had

been developed already.

In 1949, a Hungarian American mathematician John Von Neumann, at the institute for Advanced

Study in Princeton, New Jersey, proposed that it was theoretically possible for a computer

program to replicate. Neumann (1966). This theory was tested in the 1950¶s at Bell Laboratories.

A game was developed in which players created tiny computer programs that attacked, erased

and tried to propagate on an opponent¶s system.

c
c
c
c

Early in the 1980¶s, several experiments were conducted on many operating systems to

understand this proven theory. One of such an experiment was conducted by Tom Duff in 1987.

He experimented on UNIX systems with a small virus capable of copying itself into executable

file. He first disproved a common fallacy that computer viruses are intrinsically machine

dependent, and cannot spread to systems of varying architectures. At that time, viruses mainly

spread through the exchange of floppy disks as the internet and computer networks were

unpopular.

There appeared another evolution of self replicating code called Trojan horse first introduced in

1985. A Trojan code named EGABTR is the first to be produced as a game called NUKE-LA. A

host of increasingly complex viruses followed.

As computer networks and the internet become more popular, viruses quickly evolve to be able

to spread through the Internet by various means such as file downloading, emailing, visiting a

pornographic website, etc.

1.3.2 Profile of Students and Use of Computers on Knust Campus

The computing environment of KNUST may facilitate the spread of an infectious or a self-

replicating program. Sizable number of the student population can perform some specific tasks

with the aid of a computer in his/her course of study. Some of these tasks range from typing and

printing a document to surfing the internet for relevant information. Although in the first year of

study on campus, each student in each department is required to take up an introductory course

to computers but little emphasis if any is made on the operation and spread of malicious

programs that could frustrate their computing life.

c
c
c
c

Again, a significant number of the students¶ population owns a personal computer or any of its

peripherals. One peripheral which is highly used by both students and lectures is the USB Pen

drive. It is the most popular means of sharing data between users here on campus.

To facilitate access to the diverse information on the internet, the school has provided various

wireless hotspots (Access Point) at strategic locations throughout the campus where students are

able to connect to the internet with their laptops and personal computers. Some these strategic

locations are at the various halls of residence, non-residential facilities and lecture theatres.

While each student has the right to do so, students without a personal computer or a laptop could

visit ICT centers at various departments or at the main library to surf the internet for information.

Students make use of this to download software and visit other sites that could compromise their

systems.

Nonetheless, the school authorities have restricted access to known unsecured websites that

could introduce computer virus into the campus.

1.4.0 Problem Statement

With the resurgence of the Internet, computer viruses are able to propagate much faster, and

more aggressively. Now that there are many sophisticated ways of connecting to the internet

regardless of the location, users may visit sites containing infectious programs which in turn

increases virus spread on KNUST campus. The upsurge of different ways of sharing data

between computer systems and increase in technology has also resulted in an increase in the

spread of computer viruses. Simply put, there are so many different conditions under which

computer virus spread all of which are also present on the KNUST campus and very little is

c
c
c
c

known about the threat of a possible epidemic. It has therefore become necessary to investigate

the modes through which computer virus spread using mathematical equations.

1.5.0c Objectives

We aim at analyzing the spread of computer virus on computer system in KNUST campus. At the

end of the project we should be able to:

(i)c state the factors contributing to the spread of computer virus.

(ii)cdetermine whether there will be an epidemic on KNUST campus.

(iii)determine if there is a possible computer virus extinction on KNUST campus.

1.6.0 Methodology

1.6.1 Model

In the course of our study, the SIS epidemiological model would be used in order to achieve the

stated objectives. The parameters in the model will be estimated by fitting the data to a simple

linear regression model.

1.6.2 Data Type, Source and Period of Collection

In order to obtain data for analysis, a case study is conducted on KNUST campus over 30 days

period. Questionnaires are designed and distributed to sixty student computer users at KNUST

during which the log history of antivirus scan of each user is recorded.

The questionnaire captures data on various infections, source of infection, time of infection,

action taken by the antivirus, users¶ frequent connectivity to the internet and others.


c
c
c
c

1.6.3 Software Used

Excel spreadsheet application, MATLAB, R software is used to obtain descriptive statistical

values which will help us to arrive at valid conclusions.

Our sources of information for the entire project would be obtained from some publications and

articles on the internet.

1.7.0 Justification

Although the estimates are somewhat speculative, computer viruses have cost computer users

billions of cedis over the year. Annually many people spend so much on antivirus products and

services. Consequently, methods to analyze, track, model, and protect against viruses are of

considerable interest and importance.

1.8.0 Organization of Project

Chapter one of the studies consists mainly of the general introduction to study, objectives,

problem statement and research questions, scope and methodology.

Chapter two provides the framework for data analysis which deals with a review of the relevant

literature on the subject matter.

Chapter three presents how the research is conducted and gives a detail procedure of formulating

the SIS compartmental model. A framework of estimating the model parameters and the Basic

Reproduction Ratio, R0 to determine any possible epidemic, based on the data collected.

Chapter Four presents the quantitative and qualitative analyses of the results. This includes data

analysis, computations, presentations and discussion of results.

 c
c
c
c

Finally, Chapter Five is based on general overview, recommendations, problems encountered

and conclusions drawn on the project.

1.9.0 Summary of Chapter

Chapter One of the studies consists mainly of the general introduction to study, background of

study, objectives, problem statement and research questions, scope and methodology. The

subject under discussion set out the procedure to guide the conduct of the research.

c
c
c
c

c c

LITERATURE REVIEW

2.0.0 Introduction

Today anyone who uses the term µcomputer virus¶ when referring to a malicious software

program, alludes to the work of a popular Computer Scientist and Electrical Engineer, Fred

Cohen. Cohen (1985) first coined the term µcomputer viruses¶ in 1985 when working on his PhD.

Thesis. Prior to this year, software applications designed to have the intrinsic property of self

replication was not referred to as computer virus. The usefulness of Cohen work extends beyond

just introduction and definition of the term µcomputer viruses¶. Further examinations and

findings were made.

For instance, Cohen (1985) concluded that the path along which information flows fosters the

spread and propagation of virus in a closed region. He called this transitive closure of shared

information. In simple terms, if A can infect B and B can infect C, a virus that originates with A

can propagate to C. His work showed that systems with potential of protection from a viral attack

are systems with the following three features.

They are

(i)c systems with limited transitivity of sharing information

(ii)c systems with no sharing

(iii)c systems without general interpretation of information.

c
c
c
c

2.1.0 Computer Virus Epidemiology

Mishra and Ansari, (2008) has formulated some five mathematical models of the interaction

between a computer virus and an antivirus software program inside a computer system with an

immune system. They calculated the basic reproductive ratio in the absence and presence of the

immune system and analyzed the criterion of spreading the computer virus. Analysis was made

for the immune response to clear the infection. They observed that the effect of new or updated

antivirus software on quarantined virus is not completely removed by the lower version of

installed antivirus software in the system. Reactivation of computer virus when they are in the

latent class was mathematically formulated and a basic reproductive ratio was obtained. Finally,

a mathematical model was developed to understand the recent attack of the malicious objects

Backdoor.Haxdoor.S and Trojan. Schoeberl.E and a removal tool called FixSchoeb-Haxdoor.

Virus study at the microscopic level has helped to understand the actions executed by a malware

and hence to provide ways of detecting their presence on an infected system. Its analogy in

biological research is the quest of microbiologist to obtain new vaccines and medicines against a

new disease pathogen. Chen, (2006) He asserted that very little effort is spent to treat worms and

viruses at the macroscopic level

According to Tabak (2004), Bernoulli made a major contribution to epidemiology by

mathematically proving that variolation (inoculation with a live virus obtained from a mild case

of smallpox) was beneficial. He was able to formulate differential equations to show that

variolation could reduce the death rate of the virus.

c
c
c
c

Kermack and McKendrick (1926) published papers on epidemic models and obtained the

epidemic threshold result that the density of susceptible must exceed a critical value in order for

an epidemic outbreak to occur.

According to Hethcote (2000), mathematical epidemiology seems to have grown exponentially

starting in the middle of the 20th century. The study of computer virus at the macroscopic level

is mostly dedicated to the spread of viruses in computing environment. In mathematical biology

literature of infectious disease, statistical analyses are made on epidemiological data in order to

find information and policies aimed at lowering and preventing epidemic outbreaks. A

tremendous variety of models have now been formulated, mathematically analyzed, and applied

to infectious diseases.

Murray (1988) is the first to suggest the relationship between biological epidemiology and the

spread of computer viruses. Although he did not propose any speci¿c model, he pointed out

analogies to some public health epidemiological defense strategies. It was intended to give an

understanding of viruses and the issues they raise.

Gleissner (1989) introduced a model to treat the spread of computer viruses mathematically. A

recurrence formula was given which allows a closed expression to be derived for the probability

that, starting from an initial state, a given viral state will be reached after executing some number

of programs. He showed that the infection process does not terminate before all programs which

are visible for any program in the initial state are infected. Gleissner further showed that the

transitive closure of information could occur at an exponential rate. However, the usefulness of

these results was limited because no allowance was made for the fact that individual users of the

system might detect and remove viruses or alert other users of their presence.

c
c
c
c

Kephart et al (1993) has investigated susceptible infected- susceptible (SIS) models for computer

virus spread. They formulated a directed random graph model and studied its behaviour via

deterministic approximation, stochastic approximation, and simulation. An extension of a

standard epidemiological model was made by placing it on a directed graph and a combination of

analysis and simulation was used to study its behavior. This enabled them to determine the

conditions under which epidemics are likely to occur, and in cases where they do, the dynamics

of the expected number of infections were expressed as a function of time. They concluded that

an imperfect defense against computer viruses could still be highly effective in preventing their

widespread proliferation, provided that the infection rate does not exceed a well-defined critical

epidemic threshold.

2.2.0 Computer Virus Prevalence

Tippett et al (1991) predicted that the number new viruses per day will increase exponentially

worldwide by year 2000 and hence there is a likely threat of an epidemic. c

To counter this, Kephart et al (1993) observed that although the rate of appearance of new

viruses in the collections of anti-virus workers has been increasing gradually for several years, at

roughly a linear rate, nothing at all about viruses is µincreasing exponentially¶ worldwide. They

concluded that there could be an epidemic in a closed region that allows sharing but not at a

worldwide level.

c
c
c
c

cc

METHODOLOGY

3.0.0 Introduction

In order for a disease to persist indefinitely there must be a supply of fresh or new susceptible,

either through recovery without immunity or through births. The SIS model can be used to

explore the spread of computer virus because no infected computer can be granted immunity

from infection.

In this chapter we describe the formulation of the SIS model, define variables and parameters to

be used and state some of the underlying assumptions made in the model.

3.1.0 Definition of Variables and Parameters


9 Time

: = Total computer population throughout the period

* +9Total number of infected computers at time, t. Also known as infective class

* +9 Total number of susceptible computers at time, t. (When a computer recovers from

infection it goes back to the infected group)

4 = Measures the percentage of the computers that recovers from infection each period

5
= Proportion of all contacts which results in an infection.

#
= the basic reproduction ratio (sometimes called basic reproductive rate)

cc

c
c
c
c

3.2.0 Some Basic Assumptions about the SIS Model

(i)c The total number of computers, N shall remain constant throughout the period.

:* +
9
:

(ii)cEvery member of the population can be assigned to one of the two compartments

(susceptible and infected) and they are mutually exclusive. This means that at any

moment in time, :* +
9
* +
;
* +

(iii) Only contacts between an infected computer and a susceptible computer shall be

considered.

(iv)c The number of computers that recover in each period is constant. c

3.3.0 Formulation of the Model

The SIS model is formulated by using systems of differential equations approach. In this model,

the total number of computers is partitioned into two main disjoint compartments. These

compartments are the infected compartment and the susceptible compartment, denoted,  and 

respectively. An individual computer is said to be infected if it is in  and is susceptible if in .

The number of computers that make up the total population at time, t is given by,

:* +
9
* +
;
* +



*’+

For the SIS model, infected computers return to the susceptible compartment,  on recovery

within a given period of time. Therefore computers in the infective compartment potentially

move from being infected to susceptible periodically.

c
c
c
c

3.4.0 Model Description

In this section, we fully describe the SIS model in the following systems of ordinary differential

equations.

$ ( )




 „  ( )  ( )  ( )


*+

$ ( )




 „  ( )  ( )  ( )


*+

The above system ensure that the change in the size of the infected compartment always equal

the change in the size of the susceptible compartment when we have a constant population, N.

Thus * +
;
* +
9
* ;’+
;
* ;’+
9
: under constant population. This makes the second

assumption to hold.

3.4.1 Contacts between the Compartments

A susceptible computer becomes infected with a computer virus when it makes a contact with a

compromised USB pen drive. Another means by which a susceptible computer moves to the

infected compartment is by visiting an unsecured website on the internet. Also a computer in a

network of infected computers can be infected and thus move to the infected compartment.

However not all contacts of this kind result in an infection. Thus if at any time,
each infected

computer makes Ȗ number of contacts with a susceptible computer, then the possible number of

new infections will be given by * ;’+

9
<* + . Suppose only = percent of contacts result in an

infection. Thus each contact results in 5


9
=< new infections in each time, .

3.4.2 Recovery Parameter, 4

Based on the fourth assumption, a recovery parameter ț can be defined. This parameter measures

the percentage of the population that recovers from an infection at time, t. Thus if the time to

 c
c
c
c

1 1
recover corresponds to time interval ȍ, then „  This means that th of the population
 

should recover each period on average.

3.4.3 Threshold Parameter

The epidemic threshold, Å for the SIS model can be determined by the expression


#0 




*+

It is also known as Basic Reproduction Ratio (BRR).

It is defined as the number of new or secondary infections when a host computer is introduced

into the computer population. This parameter is useful because it helps determine whether or not

a computer virus will continue to spread through the computer population.

If #
>’2 then the level of spread of computer virus increases and hence the number of

susceptible computers will decreases. When this happens, then there is a possible threat of a

computer virus epidemic.

On the other hand, if #< 1, then the level of computer virus rapidly die out and the number of

susceptible computers increase.

3.5.0 Exact Solution

According to Sae-jie, (2010) et al the analytical solution of the SIS model for all values of

parameters is still unknown, although numerical solutions can be obtained for any given

parameter values. Therefore qualitative analysis tends to be a useful tool. But Khan, Sadiq and

Shabbir et al (2010) has derived an exact solution to the SIS model.


c
c
c
c

The exact solution of systems could be obtained by converting the two equations into a Bernoulli

differential equation and solving it linearly. From equation (1)



* +9
:

* +




*"+

Put *"+ into *+ gives





m  
m 
9
5
* +

*%+



Dividing *%+ by * + to obtain

  



m 
9
5


*'+

   


*'+ is a Bernoulli ordinary differential equation hence solved by letting   such that


 



m



*(+

   

Putting *(+ into *'+ gives





 
m  

*?+



The integrating factor of *?+ is ”   . Multiply *?+ by the integrating factor gives

”     ”    , where C is an integration constant.

  




 


*’+

 


Substituting   back into *’+ the solution of the differential equation can be written as


 



   

*’’+

   

 c
c
c
c
  !
At
9
2 let *+
9

then  
! 

Now let <


9

m  then the final solution is



 ( ) 


*’+

ñ   0 ñ
ñ ( )
0


( )  : 

*’+

ñ   0 ñ
ñ ( )
0

where :
9
* +
; * +

3.6.0 Estimation of Parameters

3.6.1 5 and 4 Parameters

In this section, the method of estimating the parameters ț and ȕ are presented.

Dividing *+ by * +2gives

 "#





9
@
4
;
$


*’+



In order to estimate the parameters ț and ȕ, we use the method of simple linear regression model.

The difference equation version of the SIS Model given in (15) and (16) is used to estimate the

parameters.


  m  m %  m   $ 


*’"+


$ m $  m %   m $ 


*’%+

, where ' the change is in time. Then setting


I"#"# &
From (15), we calculate
'

c
c
c
c

 
I"#"# &
and 9 * + 
and
variables are used as the predictor and response
'

variables of a simple linear regression model.

I"#"# &

9
@
4
;
$

(17)
'

The regression model is stated below

   ( (18)

Now comparing *’'+ to *’(+, 4


9

5 and 5
95’

3.6.2 Reproduction Ratio, R0

The basic reproduction ratio, R0 is estimated using the formula


R0 =


3.6.3 Analysis of Variance

Table 1 shows how the parameters and will be calculated. F0 is the F-statistic.

cc cc

ANOVA TABLE

SOURCE OF PARAMETERS DEGREE OF SUM OF MEAN F0


VARIATION FREEDOM SQUARES SQUARES

REGRESSION ¼0 !
@
’
SSR MSR  #
 6
¼1

ERROR  
@
!
SSE MSE

TOTAL 
@
’
SST

c
c
c
c

Where:

p = Number of parameters to be estimated.

n = Total number of observations.



Sum of Squares of Regression (SSR) = * ) + , where Sxx = /#*, (*+ m #
/ 



Sum of Squares of Errors (SSE) =

)  , where Syy =/#*, *+ m #  and
/ 


/  / 

Sxy =/#*, (* * m 
#

Sum of Squares Total (SST) = SSE +SSR


--.
Mean Square of Regression (MSR) =
/

--0
Mean Square of Error (MSE) =
#/

Z-.

9

Z-0

-12
Sample Correlation Coefficient2

9
, -1 ” r ” 1
3-22 -11


-12
Coefficient of Determination,

9
- , 0 ” r2 ” 1
22 -11

c
c
c
c

cc

DATA COLLECTION, ANALYSIS AND RESULT

4.0.0 Introduction

A statistical estimation of the parameters ț and ȕ will be given by fitting the data to a simple

linear regression model. Finally, we use Excel spreadsheet, MATLAB and R software to give a

comprehensive statistical and qualitative analysis of the data obtained.

4.1.0 Source of Data

During the case study, questionnaires were distributed to sixty computer users on KNUST

campus. The log history of each computer¶s antivirus software were recorded. The questionnaire

captured data on various infections, sources of infection, time of infection, action taken by the

scanner and users¶ frequent connectivity to the internet.

There were challenges encountered during the survey. Although we distributed sixty

questionnaires to sixty computer users on KNUST campus, 53 were accurately completed.

The Period, t Infected Computers, I(t) and Susceptible computers, S(t) of the data collected is

shown in the Appendix A on page 51.

4.2.0 Estimation of Parameters (ț, ȕ), And Reproductive Ratio (R0)

The simple linear regression model line given by





A
9
5
;
͒




*’+

c
c
c
c

[ln  ( )  ln  (  1)]
Where  is the response variable and 
9
* + is represents the predictor
V

variable as derived from the previous chapter.

Table 2 shows the calculated values of the predictor


*+ and response * + variables using R

software (version 2.9.2). The calculations were performed on a system running window 7

Ultimate edition with the hardware specifications:

(i)c Processor: Intel(R) Pentium (R) Dual CPU T3400 @ 2.16GHz 2.17GHz

(ii)cInstalled Memory(RAM): 2.00 GB (1.87GB usable)

(iii)System type: 32-bit Operating System

c
c

c
c
c
c
c

cc !c "#c

Predictor *+ Response * +


42 -2.3979
52 0
52 1.098612
50 -1.09861
52 1.098612
50 -0.40547
51 0
53 0
52 1.609438
48 -0.22314
49 -0.69315
51 0.693147
49 0
53 0
49 0.223144
48 -0.91629
51 0.693147
49 0.405465
47 -0.40547
49 0.559616
46 -0.15415
47 -0.69315
50 0.980829
45 -0.98083
50 0.847298
46 0
53 0
52 0.693147
51 0.405465

c
c
c
c

$cc%""c "c &c"'c !c "#c

The scatter plot of Table 2 in Figure 1 above justifies that simple linear regression modeling will

be appropriate to estimate the parameters of the SIS Model. Data from table 2 is fitted into the

linear regression equation (1) and the values of the parameters, ȕ0 and ȕ1 are estimated with the

aid of the R- software.

The result is displayed in table 3 below.

c
c
c
c
c

c(c !c"""%#c

ANOVA TABLE

SOURCE OF PARAMETERS DEGREE OF SUM OF MEAN F0


VARIATION FREEDOM SQUARES SQUARES

ȕW 0.26691 1 14.04721 14.04721 29.3014


REGRESSION ȕW -13.27087

ERROR ı+ 0.47941 27 12.94420 0.47941


TOTAL 28 26.99141

Hence the estimated regression model is


4
9
’ '('
;
 %%?’

*+

Hypotheses:

H0: Í = 0

H1: Í  0

The F-Statistic at Į = 0.05 significant level is given by

0.05(1,27 ) 4.21

Conclusion:

We fail to accept H0 since F0 = 29.3014 > 0.05(1,27 ) 4.21

Thus Í1 0, indicates that there is a linear relationship between the response and the predictor

variables and the predictor variable contribute significantly to the model.c

 c
c
c
c

$cc) "$" !#c

The rejection and acceptance region of the F-Distribution is shown in figure 2, indicating

whether to accept or reject H0.

The Sample Correlation Coefficient,


 = 0.72141, implies that 72.141% of the model has been

accounted for by the least square regression model.

The Coefficient of Determination,  = 0.52043, also implies the proportion variability in the

response variable explained by the linear regression model is 52.043 %

Since from chapter 3 the regression equation is derived from (3) below,

ln ( )  ln (  1)  




„  ()



*+



We compare *+ and *+ to get, 4


=13.27087 and 5 = 0.26691.

Now the basic reproduction ratio, R0 is then calculated as

 
0  . Hence # = 0.020112


c
c
c
c

4.3.0 Results
The exact solution to the SIS Model derived in chapter three is given as

ñ
 ( )  *+
ñ   0 ñ
ñ ( )
0



ñ
( ) 
 *"+
 ñ 0 ñ
ñ (  )
0

Where ³  Í  567 the variable have their usual meaning as defined in chapter three of this

work.

The estimated values of Í and  are substituted into *+ and *"+ to obtain the average number of

Infected Computers, * + and average number of Susceptible Computers, * + as in Appendix B.

In Appendix B rows under the column with heading SIS Modeling Result represents the

modeling result for * + and


* + in *+ and *"+ respectively whereas rows under columns with

heading Data represents the data on the number of infected computer and susceptible computers

collected during the survey.

Figure 3 is the graph of the number of infected computers and susceptible computers. SIS

Infected and SIS Susceptible are line plots of the exact solutions in *+ and *"+ respectively.

Infected Data and Susceptible Data stands for line plots of the number of infected computers and

susceptible computers collected during the case study.

 c
c
c
c

$c(cc *c +'#c

4.4.0 Qualitative Analysis of the SIS Model

The general SIS Model is





„ 


*%+

£ { 




„  {  {  {


*'+

£

c
c
c
c

4.4.1 Equilibrium Points

In order to investigate the properties of the dynamics of the model, we determine the equilibrium

points by considering that all the derivatives of population compartments vanish whenthis kind

of solution holds.

$ ( ) $ ( ) „
At equilibrium  0
and  0 and hence the equilibrium point is (  ( ), ( )) (0, ) c
$ $

4.4.2 Eigenvalues And Eigenvectors

The Jacobian of the (6) and (7) is

m  $ 
=8 $  9 :

m $ m 

From 
9
:
@

 
Evaluating B*2 + at the equilibrium point, 
9
:
m since
9

 

;
m
=9 :

; m


With eigenvalues given by °   Í  567 °  ;< =he condition for asymptotic stability of

this point is °1 < 0. At : = 53, °  m;<>?@AB

The corresponding eigenvectors  567 +  are also given as

   C  and +   C





c
c
c
c

4.4.3 Deductions

In spite of one of the eigenvalues being zero the analysis of the stability can be conclusive

because ¬ . However, a negative eigenvalue suggest that the solution of the system is

asymptotically stable.

In Figure 4, 
9
 and
9
. The portrait is obtained using a MATLAB.

$c,c'c!#c

c
c
c
c

4.5.0 Discussions

From the calculations, the parameters are estimated to give ț = 13.27087 and ȕ = 0.26691. This

means that averagely, 13.27087 infected computers recover in each day and each infected

computer potentially contacts and infects an average of 0.26691 computers per day respectively.

Given the initial number of infected computers,  =11, the estimated values of and are

substituted into *+ and *"+ to obtain the average numbers of Infected Computers, * + and

average number of Susceptible Computers, * + at every in the table at appendix B. It is noticed

that the percentage number of infected computers reduces to 6.19% of the computer population

at
9 29.

Now leaving the infection and recovery parameters the same but increasing the initial number of

infected computers to 20 and later decreasing to 8 initial numbers of infected computers, it can

be observed that the number of infected computers steadily decreases down to the steady level of

about 3.2796. This is the same as the steady level obtained when the initial number of infected

computers is 11. This implies that it doesn't matter what number of the infected computers we

start with, the steady state is determined by the infection and recovery parameters of the model.

We conclude that the steady state infection level is independent of the initial state.

At the steady state level, if the value of the recovery parameter, ț is increased to about 14.14623

computers per day then number of susceptible computers goes to 53 and the number infected

computers of the computer population goes to 0. Thus computer virus will die out.

Now looking at the effect of 5 at the steady state, increase 5 to 1 and later to 1.5 from the original

value 0.26691, we notice that as we increase 5 then averagely we are increasing the number of

c
c
c
c

infected computers to 40 and 44 respectively. Thus the computer virus spreads more quickly. As

we keep increasing the number of contacts we see that the number of infected computers of the

computer population continues to increase.

We deduce that as the number of infected computers decreases, there is a corresponding increase

in the number of susceptible computers as increases which is illustrated in figure 3. This can be

attributed to the fact that the basic reproduction ratio, R0 = 0.020112 is less than 1.

This implies that in long run there might not be an outbreak of computer virus among computer

users on KNUST campus. However at any given time, the sum of infected computers and

susceptible computers is equal to the number of computers in the population.

Although from the table in Appendix A, the average number of susceptible computers

outnumbers the average number of infected computers; this does not mean that the spread of

computer virus on KNUST campus will decline to extinction.

This can be attributed to the fact that student computer users on KNUST campus frequently

share electronic data among themselves through the use of USB pen drives and external hard

drive. Furthermore, these users on campus access the internet daily, probably to download media

files from unsecured websites.

c
c
c
c

cc

CONCLUSION AND RECOMMENDATIONS

5.0.0 Conclusion

We draw the following conclusions from the project in relation to the set targets. The spread of

computer virus depends on contacts that result in infections and recovery rate computer virus.

There will not an epidemic of computer virus since the computed R0 from the data collected is

less than 1. There can only be extinction of computer virus on KNUST campus if averagely 14

computers out of the 53 daily recover from infections. In reality, extinction of computer viruses

is quite impossible since students frequently share files among themselves through pen drives

and accessing unsecured websites on the internet.

5.1.0 Recommendations

Based on our findings, we observe that in order to control the spread of computer virus, the

following guidelines must be observed by all computer users.

(i)c All users on KNUST install antivirus software programs on their computers

(ii)cUsers should update their antivirus software regularly.

(iii)Users should scan immediately after update.

c
c
c
c

REFERENCES

1. Cohen, F.: Computer Viruses, Theory and Experiments, PhD thesis, University of Southern

California (1985)

2. System Dynamic Model for Computer virus Prevalence By Abdelazim and Wahba, 2010

3. Computer Virus Strategies and Detection Methods, by Daoud, Jebril and Zaqaibeh, 2008

4. Theory of Self-Reproducing Automata, by John Von Neumann, 1966

5. MS Encarta 09, Virus [computer], History

6. Mathematical Models on Interaction between Computer Virus and Antivirus Software

inside a Computer System, Bimal Kumar Mishra and Gholam Mursalin Ansari, Birla Institute of

Technology, 2008.

7. N. T. J. BAILEY, The Mathematical Theory of Infectious Diseases, 2nd ed., Hafner,

New York, 1975.

8. W. O. KERMACK AND A. G.MCKENDRICK, Contributions to the mathematical

theory of epidemics, part 1,Proc. Roy. Soc. London Ser. A, 115 (1927), pp. 700-721.

9. A. G. MCKENDRICK, of mathematics to Applications medical problems, Proc.

Edinburgh Math. Soc., 44 (1926), pp. 98-130

c
c
c
c

10. W. Murray (1988), The application of epidemiology to computer viruses. Computers &

Security Volume 7, issue 2, pages 139-150

12. Winfried Gleissner (1989). A mathematical theory for the spread of computer viruses.

Computers & Security Volume 8, issue 1,pages 35-41.

13. J. O. Kephart and S. R. White (1991) Directed-graph epidemiological models of

computer viruses. 1991 IEEE Computer Society Symposium on Research in Security and

Privacy, 343-359.

14. J. O. Kephart, S. R. White, and D. M. Chess (1993) Computers and epidemiology. IEEE

Spectrum 30, 20-26.

15. P.S. Tippett, "The Kinetics of Computer Virus Replication: A Theory and Preliminary

Survey," Safe Computing: Proceedings of the Fourth Annual Computer Virus and Security

Conference, New York, New York, March 14-15, 1991, pp. 66-87.

16. Computer Virus: A Global Perspective, Steve R. White, Jeffrey O. Kephart and David M.

Chess

17. Mathematical Modeling of Epidemics, Emma Harris, 2008.

18. Introduction to the Modeling of Epidemics ± SIS Models, Troy Tassier, 2005.

20. A note on exact solution of SIR and SIS epidemic models by G. Shabbir, H. Khan1 and

M. A. Sadiq (2010)

21. The History of Mathematics, John Tabak, PhD. 2004.

 c
c
c
c

p p

DATA FROM KNUST, KUMASI

Period (t) Infected Computers, I(t) Susceptible Computers, S(t)


0 11 42
1 1 52
2 1 52
3 3 50
4 1 52
5 3 50
6 2 51
7 0 53
8 1 51
9 5 48
10 4 49
11 2 51
12 4 49
13 0 53
14 4 49
15 5 48
16 2 51
17 4 49
18 6 47
19 4 49
20 7 46
21 6 47
22 3 50
23 8 45
24 3 50
25 7 46
26 0 53
27 1 52
28 2 51
29 3 50


c
c
c
c

APPENDIX B:
SIS MODELING RESULT COMPARED TO THE RAW DATA

Data SIS Modeling Data


Period (t) Infected Susceptible Infected Susceptible Total (N)
Computers, I(t) Computers, Computers, I(t) Computers, S(t)
S(t)
0 11 42 11 42 53
1 1 52 4.635296884 48.36470312 53
2 1 52 3.734789001 49.265211 53
3 3 50 3.455081423 49.54491858 53
4 1 52 3.350516527 49.64948347 53
5 3 50 3.308788049 49.69121195 53
6 2 51 3.291704519 49.70829548 53
7 0 53 3.284637567 49.71536243 53
8 1 51 3.281701633 49.71829837 53
9 5 48 3.280479743 49.71952026 53
10 4 49 3.279970835 49.72002917 53
11 2 51 3.279758814 49.72024119 53
12 4 49 3.27967047 49.72032953 53
13 0 53 3.279633657 49.72036634 53
14 4 49 3.279618317 49.72038168 53
15 5 48 3.279611925 49.72038807 53
16 2 51 3.279609261 49.72039074 53
17 4 49 3.279608151 49.72039185 53
18 6 47 3.279607689 49.72039231 53
19 4 49 3.279607496 49.7203925 53
20 7 46 3.279607416 49.72039258 53
21 6 47 3.279607382 49.72039262 53
22 3 50 3.279607368 49.72039263 53
23 8 45 3.279607362 49.72039264 53
24 3 50 3.27960736 49.72039264 53
25 7 46 3.279607359 49.72039264 53
26 0 53 3.279607359 49.72039264 53
27 1 52 3.279607358 49.72039264 53
28 2 51 3.279607358 49.72039264 53
29 3 50 3.279607358 49.72039264 53

 c
c

Вам также может понравиться