Final)

c
c c c c c c
WE HEREBY DECLARE THAT THIS PROJECT WAS TRULY UNDERTAKEN BY US
UNDER SUPERVISION AND HAS NOT IN PART OR WHOLE BEEN PRESENTED FOR
ANOTHER PROJECT.
GYAMFI ATTA KWAME (1551907) .«««««.. «««««..
NAME AND INDEX NUMBER SIGNATURE DATE
ABORABORAH NATHANIEL (1545407) «.««««.. «««««..
BOAHEN FRANK (1549607) «««.««.. «««««..
YERIBATUAH PETER (1558007) «««.««.. ««««««
CERTIFIED BY: MR. K. F. DARKWAH «««««« ««««««
SUPERVISOR SIGNATURE DATE
CERTIFIED BY: DR. S.K. AMPONSAH «««««« ««««««
HEAD OF DEPARTMENT SIGNATURE DATE
c
c
c
c

c
We thank Mr. K. F. Darkwah for his diligence, creativity and invaluable efforts in the
completion of this project, Student computer users of KNUST and our families.
c
c
c
c
c
We dedicate this project to our families especially Mom and Dad.
c
c
c
c
c
Differential Equations is a very effective mathematical tool and has proven to be very useful
research tool in various real-life problems involving dynamical systems such as the spread of
computer virus, diseases, population growth in aqua-culture, etc. This project seeks to investigate
the spread of computer viruses. This was made possible using the SIS Model, differential
equations and regression analysis.
The SIS Model shows that once a susceptible computer becomes infected with a computer virus,
it becomes susceptible again on recovery. Thus no recovered computer system can be granted
immunity from infection. Data was taken from student computer users on KNUST campus.
The model yielded a result showing that the number of susceptible computers increases as the
number of infected computers decreases until a point where they all become fairly stable. The
reproduction ratio, R0 <1 implies that in future there might not be an outbreak of computer virus
on KNUST campus.
It is our belief that the research findings as well as the recommendations made in this study if
adapted will go a long way to reduce the spread of computer virus on KNUST campus.
c
c
c
c
c
c
cc c
c cc

cc
c cc
c cc
cc c cc
ccc cc
ccc cc
c c cc
INTRODUCTION ............................................................................................................................................. 1c
c

c cc
c c cc
c cc
c
c cc
1.2.0c TYPES OF COMPUTER VIRUSES ............................................................................................................... 3c
c
c cc
c
c cc
c
c cc
c

!
cc
"c # $
c cc
%c &
c cc
'c !
c cc
(c
)
$
*+c c c
1.3.0c BACKGROUND OF STUDY ........................................................................................................................ 6c
c ,
-$

!
c c c
c

$
$
.

!

/
! c cc
1.4.0c PROBLEM STATEMENT............................................................................................................................ 8c
1.5.0c OBJECTIVES ........................................................................................................................................... 9c
1.6.0c METHODOLOGY ..................................................................................................................................... 9c
% c $c cc
% c 0
1 !2

$
$

c cc
% c
. $c cc
1.7.0c JUSTIFICATION ..................................................................................................................................... 10c
1.8.0c ORGANIZATION OF PROJECT ................................................................................................................. 10c
1.9.0c SUMMARY OF CHAPTER ........................................................................................................................ 11c
cc cc
LITERATURE REVIEW ................................................................................................................................. 12c
2.0.0c INTRODUCTION .................................................................................................................................... 12c
2.1.0c COMPUTER VIRUS EPIDEMIOLOGY ........................................................................................................ 13c
2.2.0c COMPUTER VIRUS PREVALENCE ........................................................................................................... 15c
cc cc
c
c
c
c
METHODOLOGY .......................................................................................................................................... 16c
3.0.0 INTRODUCTION ..................................................................................................................................... 16c
3.1.0 DEFINITION OF VARIABLES AND PARAMETERS ....................................................................................... 16c
3.2.0 SOME BASIC ASSUMPTIONS ABOUT THE SIS MODEL ............................................................................... 17c
3.3.0 FORMULATION OF THE MODEL ............................................................................................................... 17c
3.4.0 MODEL DESCRIPTION ............................................................................................................................ 18c

)
3
! c cc

#
2
4 cc

13 3$
c cc
3.5.0 E XACT SOLUTION ................................................................................................................................. 19c
3.6.0 ESTIMATION OF PARAMETERS ............................................................................................................... 21c
% c 5
$
4
c cc
cc cc
DATA COLLECTION, ANALYSIS AND RESULT ........................................................................................ 24c
4.0.0c INTRODUCTION .................................................................................................................................... 24c
4.1.0c SOURCE OF DATA ................................................................................................................................. 24c
4.2.0c ESTIMATION OF PARAMETERS (Ȁ, Ǻ), AND REPRODUCTIVE RATIO (R0) ................................................... 24c
4.3.0c RESULTS .............................................................................................................................................. 30c
4.4.0c QUALITATIVE ANALYSIS OF THE SIS MODEL......................................................................................... 31c
c 67)
c cc
c 6
8$
6 c cc
c 0$ c cc
4.5.0c DISCUSSIONS ....................................................................................................................................... 34c
c cc
CONCLUSION AND RECOMMENDATIONS ............................................................................................... 36c
5.0.0c CONCLUSION ....................................................................................................................................... 36c
5.1.0c RECOMMENDATIONS ............................................................................................................................ 36c
REFERENCES ................................................................................................................................................ 37c
APPENDIX B: ................................................................................................................................................. 40c
c
ccc
c
Table 1: ANOVA TABLE ........................................................................................................ 22c
c
c
c
c
Table 2: Regression Data. ......................................................................................................... 26c

Table 3: Regression Statistics. ................................................................................................... 28c
c
c c
c
c
c
c
ccc
Figure 1: Scatter Plot of the Regression Data. ........................................................................... 27c
Figure 2: F-Distribution. ........................................................................................................... 29c
Figure 3: SIS Model Graph. ...................................................................................................... 31c
Figure 4: Phase Plane. ............................................................................................................... 33c
c
c
c
c
c
c c
INTRODUCTION
The term computer virus is often misunderstood. Many users who do understand it may not
understand protection in computer systems. Since it is the sole intension of authors of computer
virus to gain unauthorized access to relevant information on target computer system, users are
required to install anti-malicious software programs. This intrusion is usually referred to as a
computer virus attack. To ensure protection from such attacks, the installed anti-malicious
software program needs to be updated to regularly monitor malicious intrusions from
questionable sources. Although Trojan horse, worms and viruses are not the same, the term
computer virus is generally used when describing these malicious code or program that can
damage a computer system.
1.1.0 Malicious Software Programs
Malicious software program also know as malware is a code of software that is written with an
intention to crash a computer and make it malfunction. When executed, the software allows the
authors of the software to gain access to valuable information on compromised computer. One of
the dangers of the malicious software program is that after it is been downloaded unto a
computer system, the user might not be aware of the malicious software on the system. They
enter a computer system through attachments in e-mail messages, attachments of images,
screensavers and greeting cards, downloaded audio and video files from the internet. Usually a
malicious software program with the ability to self-replicate its code is branded as a computer
virus, Trojan horse or worm.
c
c
c
c
1.1.1 Virus
Cohen (1985) originally defined a µcomputer virus¶ to be a program that can infect other
programs by modifying themselves to include a possibly evolved copy of it. Today computer
virus comes onto the minds of users when a malware is under discussion. A technical definition
of a computer virus is given as a sequence of instructions that copies itself into other programs in
such a way that executing the program also executes that sequence of instructions. Some viruses
do little but replicate others and can cause severe harm or adversely affect program and
performance of a computer system.
1.1.2 Worms
A worm is a program very similar to a virus in design. It has the ability to self-replicate even
without human action and can lead to negative effects on a system. It takes advantage of File
Transport features on a system, which is what allows it to travel to other parts of a computer
system unaided. This makes a worm capable of travelling across a network. The end result in
most cases is that the worm consumes too much system memory (or network bandwidth),
causing Web servers, network servers and individual computers to stop responding. Examples of
worms include: PSWBugbear.B, Lovgate.F, Trile.C, Sobig.D, Mapson.
1.1.3 Trojans
Another unsavory breed of malicious code are Trojans or Trojan horses, which unlike viruses do
not reproduce by infecting other files, nor do they self-replicate like worms. A Trojan horse at
first glance will appear to be useful software but will actually damage a system once installed or
run on a computer system. When a Trojan is activated on a computer system, the results may
c
c
c
c
vary. Some Trojans are designed to be more annoying than malicious (like changing your
desktop, adding silly active desktop icons) or they can cause serious damage by deleting files and
destroying information on your system. Trojans are also known to create a backdoor on your
computer that gives malicious users access to your system, possibly allowing confidential or
personal information to be compromised.
1.2.0c Types of Computer Viruses
Several studies on computer viruses at the microscopic level have enabled researchers to group
and classify virus into types. They are mainly grouped according to their mechanism and target
of infection and according to the mode of spread.
Depending on their targets of attacks, computer viruses may be classified into some three main
types, where the targets are electronic object that can be infected by the virus. Eg. files, program
and application, file sharing media, disk etc.
1.2.1 File Infectors
File infector virus infects programs or executable files with .exe or .com extensions. Priority is
given to the execution of the virus code when a user runs an infected application and the installs
itself independently in the computer¶s memory. This allows the virus to copy its replicas into
subsequent applications that the user runs. Like other types of viruses user may be unaware of its
existence. Resident virus, directory virus, direct action virus, companion virus, polymorphic
virus, etc are some examples of file infectors.
c
c
c
c
1.2.2 Boot-Sector Virus
As its name implies, boot-sector virus infects and resides in a special part of a diskette or hard
disk that is read into memory and executed when a computer first starts. Once loaded, a boot-
sector virus can infect any diskette that is placed in the drive. In the 90¶s Boot viruses
outnumbered file infectors due to the prevalent use of zip and floppy disk to start a computer.
Nonetheless, it is important to protect your system against the infection of boot-sector virus since
it accounts for about 5% of known viruses today. Abdelazim and Wahba (2010). The only
known remedy when infected is to reformatting the whole computer system. Direct action
viruses come under this classification. Other examples also include polyboot.b, antiexe,
Michelangelo, FAT virus, etc.
1.2.3 Macro Virus
It is a type of virus that infects files that are created by certain applications which contain
macros. Macro viruses are independent of operating systems and infect files that are usually
regarded as data rather than as programs. Many spreadsheet, database and word-processing
programs can execute scripts (prescribed sequences of actions) embedded in a document. Such
scripts, or macros, are used to automate actions ranging from typing long words to carrying out
complicated sequences of calculation. Macro virus takes the advantage of this kind of execution
to infect other data files. Examples: Relax, Melissa.A, Bablas, O97M/Y2K.
1.2.4 Cavity or Spacefiller Virus
This virus attempts to install itself in an empty space without damaging any program. An
advantage of this is that the virus then does not increase the length of the program and can avoid
the need for some stealth techniques. The Lehigh virus was an early example of a cavity virus.
c
c
c
c
Because of the difficulty of writing this type of virus and the limited number of possible hosts,
cavity viruses are rare. Daoud, Jebril and Zaqaibeh

(2008)
1.2.5 Resident Virus
This type of virus dwells in the Read Access Memory (RAM) of a computer. From there it can
overcome and interrupt all of the operations executed by the system: corrupting files and
programs that are opened, closed, copied, renamed etc. Examples of resident virus are CMJ,
MrKlunky and Meve.
1.2.6 Overwrite Virus
This type of virus overwrites files with their own copy. Overwrite virus mainly deletes the
contents of an infected file. They spread quickly through e-mail and Internet Relay Chat (IRC)
popularly referred to as chat rooms. This may be a very primitive technique, but it is certainly the
easiest approach of all. Daoud, Jebril and Zaqaibeh et al (2008). Overwriting viruses cannot be
disinfected from a system. The only way to clean a file infected by an overwrite virus is to delete
the file completely from disk, thus losing the original content. Examples of this virus include:
Way, Trj.Reboot, Trivial.88.D.
1.2.7 Compressing Virus
A special virus infection technique uses the approach of compressing the content of the host
program. Sometimes this technique is used to hide the host program's size increase after the
infection by packing the host program sufficiently with a binary packing algorithm. This type of
virus employs encryption during execution and hence is classified as an encrypted virus. Daoud,
Jebril and Zaqaibeh, (2008).
c
c
c
c
1.2.8 Malicious Mobile Code (MMC)
Mobile code is a lightweight program that is downloaded from a remote system and executed
locally with minimal or no user intervention. Java applets, JavaScript scripts, Visual Basic
Scripts (VB Scripts), ActiveX controls, Microsoft remote control administrator, etc are some of
the most popular examples of mobile code that you may encounter while browsing the Web or
reading HTML-formatted e-mail. Daoud, Jebril and Zaqaibeh (2008).
1.3.0 Background of Study
1.3.1 Historic Background of Computer Virus
There are so many opinions about the history of the first computer virus produced. But this does
not permit us to conclude that there does not exist any documented history of computer virus.
The term computer virus was coined by an American electrical engineer and computer scientist
Fred Cohen in 1985. He came out with this term after describing a self-replicating computer
program that he designed to enable him acquire some privileges on a VAX-11/750 running
UNIX. However, before Cohen had accomplished this, several self-replicating programs had
been developed already.
In 1949, a Hungarian American mathematician John Von Neumann, at the institute for Advanced
Study in Princeton, New Jersey, proposed that it was theoretically possible for a computer
program to replicate. Neumann (1966). This theory was tested in the 1950¶s at Bell Laboratories.
A game was developed in which players created tiny computer programs that attacked, erased
and tried to propagate on an opponent¶s system.
c
c
c
c
Early in the 1980¶s, several experiments were conducted on many operating systems to
understand this proven theory. One of such an experiment was conducted by Tom Duff in 1987.
He experimented on UNIX systems with a small virus capable of copying itself into executable
file. He first disproved a common fallacy that computer viruses are intrinsically machine
dependent, and cannot spread to systems of varying architectures. At that time, viruses mainly
spread through the exchange of floppy disks as the internet and computer networks were
unpopular.
There appeared another evolution of self replicating code called Trojan horse first introduced in
1985. A Trojan code named EGABTR is the first to be produced as a game called NUKE-LA. A
host of increasingly complex viruses followed.
As computer networks and the internet become more popular, viruses quickly evolve to be able
to spread through the Internet by various means such as file downloading, emailing, visiting a
pornographic website, etc.
1.3.2 Profile of Students and Use of Computers on Knust Campus
The computing environment of KNUST may facilitate the spread of an infectious or a self-
replicating program. Sizable number of the student population can perform some specific tasks
with the aid of a computer in his/her course of study. Some of these tasks range from typing and
printing a document to surfing the internet for relevant information. Although in the first year of
study on campus, each student in each department is required to take up an introductory course
to computers but little emphasis if any is made on the operation and spread of malicious
programs that could frustrate their computing life.
c
c
c
c
Again, a significant number of the students¶ population owns a personal computer or any of its
peripherals. One peripheral which is highly used by both students and lectures is the USB Pen
drive. It is the most popular means of sharing data between users here on campus.
To facilitate access to the diverse information on the internet, the school has provided various
wireless hotspots (Access Point) at strategic locations throughout the campus where students are
able to connect to the internet with their laptops and personal computers. Some these strategic
locations are at the various halls of residence, non-residential facilities and lecture theatres.
While each student has the right to do so, students without a personal computer or a laptop could
visit ICT centers at various departments or at the main library to surf the internet for information.
Students make use of this to download software and visit other sites that could compromise their
systems.
Nonetheless, the school authorities have restricted access to known unsecured websites that
could introduce computer virus into the campus.
1.4.0 Problem Statement
With the resurgence of the Internet, computer viruses are able to propagate much faster, and
more aggressively. Now that there are many sophisticated ways of connecting to the internet
regardless of the location, users may visit sites containing infectious programs which in turn
increases virus spread on KNUST campus. The upsurge of different ways of sharing data
between computer systems and increase in technology has also resulted in an increase in the
spread of computer viruses. Simply put, there are so many different conditions under which
computer virus spread all of which are also present on the KNUST campus and very little is
c
c
c
c
known about the threat of a possible epidemic. It has therefore become necessary to investigate
the modes through which computer virus spread using mathematical equations.
1.5.0c Objectives
We aim at analyzing the spread of computer virus on computer system in KNUST campus. At the
end of the project we should be able to:
(i)c state the factors contributing to the spread of computer virus.
(ii)cdetermine whether there will be an epidemic on KNUST campus.
(iii)determine if there is a possible computer virus extinction on KNUST campus.
1.6.0 Methodology
1.6.1 Model
In the course of our study, the SIS epidemiological model would be used in order to achieve the
stated objectives. The parameters in the model will be estimated by fitting the data to a simple
linear regression model.
1.6.2 Data Type, Source and Period of Collection
In order to obtain data for analysis, a case study is conducted on KNUST campus over 30 days
period. Questionnaires are designed and distributed to sixty student computer users at KNUST
during which the log history of antivirus scan of each user is recorded.
The questionnaire captures data on various infections, source of infection, time of infection,
action taken by the antivirus, users¶ frequent connectivity to the internet and others.

c
c
c
c
1.6.3 Software Used
Excel spreadsheet application, MATLAB, R software is used to obtain descriptive statistical
values which will help us to arrive at valid conclusions.
Our sources of information for the entire project would be obtained from some publications and
articles on the internet.
1.7.0 Justification
Although the estimates are somewhat speculative, computer viruses have cost computer users
billions of cedis over the year. Annually many people spend so much on antivirus products and
services. Consequently, methods to analyze, track, model, and protect against viruses are of
considerable interest and importance.
1.8.0 Organization of Project
Chapter one of the studies consists mainly of the general introduction to study, objectives,
problem statement and research questions, scope and methodology.
Chapter two provides the framework for data analysis which deals with a review of the relevant
literature on the subject matter.
Chapter three presents how the research is conducted and gives a detail procedure of formulating
the SIS compartmental model. A framework of estimating the model parameters and the Basic
Reproduction Ratio, R0 to determine any possible epidemic, based on the data collected.
Chapter Four presents the quantitative and qualitative analyses of the results. This includes data
analysis, computations, presentations and discussion of results.
c
c
c
c
Finally, Chapter Five is based on general overview, recommendations, problems encountered
and conclusions drawn on the project.
1.9.0 Summary of Chapter
Chapter One of the studies consists mainly of the general introduction to study, background of
study, objectives, problem statement and research questions, scope and methodology. The
subject under discussion set out the procedure to guide the conduct of the research.
c
c
c
c
cc
LITERATURE REVIEW
2.0.0 Introduction
Today anyone who uses the term µcomputer virus¶ when referring to a malicious software
program, alludes to the work of a popular Computer Scientist and Electrical Engineer, Fred
Cohen. Cohen (1985) first coined the term µcomputer viruses¶ in 1985 when working on his PhD.
Thesis. Prior to this year, software applications designed to have the intrinsic property of self
replication was not referred to as computer virus. The usefulness of Cohen work extends beyond
just introduction and definition of the term µcomputer viruses¶. Further examinations and
findings were made.
For instance, Cohen (1985) concluded that the path along which information flows fosters the
spread and propagation of virus in a closed region. He called this transitive closure of shared
information. In simple terms, if A can infect B and B can infect C, a virus that originates with A
can propagate to C. His work showed that systems with potential of protection from a viral attack
are systems with the following three features.
They are
(i)c systems with limited transitivity of sharing information
(ii)c systems with no sharing
(iii)c systems without general interpretation of information.
c
c
c
c
2.1.0 Computer Virus Epidemiology
Mishra and Ansari, (2008) has formulated some five mathematical models of the interaction
between a computer virus and an antivirus software program inside a computer system with an
immune system. They calculated the basic reproductive ratio in the absence and presence of the
immune system and analyzed the criterion of spreading the computer virus. Analysis was made
for the immune response to clear the infection. They observed that the effect of new or updated
antivirus software on quarantined virus is not completely removed by the lower version of
installed antivirus software in the system. Reactivation of computer virus when they are in the
latent class was mathematically formulated and a basic reproductive ratio was obtained. Finally,
a mathematical model was developed to understand the recent attack of the malicious objects
Backdoor.Haxdoor.S and Trojan. Schoeberl.E and a removal tool called FixSchoeb-Haxdoor.
Virus study at the microscopic level has helped to understand the actions executed by a malware
and hence to provide ways of detecting their presence on an infected system. Its analogy in
biological research is the quest of microbiologist to obtain new vaccines and medicines against a
new disease pathogen. Chen, (2006) He asserted that very little effort is spent to treat worms and
viruses at the macroscopic level
According to Tabak (2004), Bernoulli made a major contribution to epidemiology by
mathematically proving that variolation (inoculation with a live virus obtained from a mild case
of smallpox) was beneficial. He was able to formulate differential equations to show that
variolation could reduce the death rate of the virus.
c
c
c
c
Kermack and McKendrick (1926) published papers on epidemic models and obtained the
epidemic threshold result that the density of susceptible must exceed a critical value in order for
an epidemic outbreak to occur.
According to Hethcote (2000), mathematical epidemiology seems to have grown exponentially
starting in the middle of the 20th century. The study of computer virus at the macroscopic level
is mostly dedicated to the spread of viruses in computing environment. In mathematical biology
literature of infectious disease, statistical analyses are made on epidemiological data in order to
find information and policies aimed at lowering and preventing epidemic outbreaks. A
tremendous variety of models have now been formulated, mathematically analyzed, and applied
to infectious diseases.
Murray (1988) is the first to suggest the relationship between biological epidemiology and the
spread of computer viruses. Although he did not propose any speci¿c model, he pointed out
analogies to some public health epidemiological defense strategies. It was intended to give an
understanding of viruses and the issues they raise.
Gleissner (1989) introduced a model to treat the spread of computer viruses mathematically. A
recurrence formula was given which allows a closed expression to be derived for the probability
that, starting from an initial state, a given viral state will be reached after executing some number
of programs. He showed that the infection process does not terminate before all programs which
are visible for any program in the initial state are infected. Gleissner further showed that the
transitive closure of information could occur at an exponential rate. However, the usefulness of
these results was limited because no allowance was made for the fact that individual users of the
system might detect and remove viruses or alert other users of their presence.
c
c
c
c
Kephart et al (1993) has investigated susceptible infected- susceptible (SIS) models for computer
virus spread. They formulated a directed random graph model and studied its behaviour via
deterministic approximation, stochastic approximation, and simulation. An extension of a
standard epidemiological model was made by placing it on a directed graph and a combination of
analysis and simulation was used to study its behavior. This enabled them to determine the
conditions under which epidemics are likely to occur, and in cases where they do, the dynamics
of the expected number of infections were expressed as a function of time. They concluded that
an imperfect defense against computer viruses could still be highly effective in preventing their
widespread proliferation, provided that the infection rate does not exceed a well-defined critical
epidemic threshold.
2.2.0 Computer Virus Prevalence
Tippett et al (1991) predicted that the number new viruses per day will increase exponentially
worldwide by year 2000 and hence there is a likely threat of an epidemic. c
To counter this, Kephart et al (1993) observed that although the rate of appearance of new
viruses in the collections of anti-virus workers has been increasing gradually for several years, at
roughly a linear rate, nothing at all about viruses is µincreasing exponentially¶ worldwide. They
concluded that there could be an epidemic in a closed region that allows sharing but not at a
worldwide level.
c
c
c
c
cc
METHODOLOGY
3.0.0 Introduction
In order for a disease to persist indefinitely there must be a supply of fresh or new susceptible,
either through recovery without immunity or through births. The SIS model can be used to
explore the spread of computer virus because no infected computer can be granted immunity
from infection.
In this chapter we describe the formulation of the SIS model, define variables and parameters to
be used and state some of the underlying assumptions made in the model.
3.1.0 Definition of Variables and Parameters

9 Time
: = Total computer population throughout the period
* +9Total number of infected computers at time, t. Also known as infective class
* +9 Total number of susceptible computers at time, t. (When a computer recovers from
infection it goes back to the infected group)
4 = Measures the percentage of the computers that recovers from infection each period
5
= Proportion of all contacts which results in an infection.
#
= the basic reproduction ratio (sometimes called basic reproductive rate)
cc
c
c
c
c
3.2.0 Some Basic Assumptions about the SIS Model
(i)c The total number of computers, N shall remain constant throughout the period.
:* +
9
:
(ii)cEvery member of the population can be assigned to one of the two compartments
(susceptible and infected) and they are mutually exclusive. This means that at any
moment in time, :* +
9
* +
;
* +
(iii) Only contacts between an infected computer and a susceptible computer shall be
considered.
(iv)c The number of computers that recover in each period is constant. c
3.3.0 Formulation of the Model
The SIS model is formulated by using systems of differential equations approach. In this model,
the total number of computers is partitioned into two main disjoint compartments. These
compartments are the infected compartment and the susceptible compartment, denoted, and
respectively. An individual computer is said to be infected if it is in and is susceptible if in .
The number of computers that make up the total population at time, t is given by,
:* +
9
* +
;
* +

*+
For the SIS model, infected computers return to the susceptible compartment, on recovery
within a given period of time. Therefore computers in the infective compartment potentially
move from being infected to susceptible periodically.
c
c
c
c
3.4.0 Model Description
In this section, we fully describe the SIS model in the following systems of ordinary differential
equations.
$ ( )

( ) ( ) ( )

*+
$ ( )

( ) ( ) ( )

*+

The above system ensure that the change in the size of the infected compartment always equal
the change in the size of the susceptible compartment when we have a constant population, N.
Thus * +
;
* +
9
* ;+
;
* ;+
9
: under constant population. This makes the second
assumption to hold.
3.4.1 Contacts between the Compartments
A susceptible computer becomes infected with a computer virus when it makes a contact with a
compromised USB pen drive. Another means by which a susceptible computer moves to the
infected compartment is by visiting an unsecured website on the internet. Also a computer in a
network of infected computers can be infected and thus move to the infected compartment.
However not all contacts of this kind result in an infection. Thus if at any time,
each infected
computer makes Ȗ number of contacts with a susceptible computer, then the possible number of
new infections will be given by * ;+
9
<* + . Suppose only = percent of contacts result in an
infection. Thus each contact results in 5

9
=< new infections in each time, .
3.4.2 Recovery Parameter, 4
Based on the fourth assumption, a recovery parameter ț can be defined. This parameter measures
the percentage of the population that recovers from an infection at time, t. Thus if the time to
c
c
c
c
1 1
recover corresponds to time interval ȍ, then This means that th of the population

should recover each period on average.
3.4.3 Threshold Parameter
The epidemic threshold, Å for the SIS model can be determined by the expression

#0

*+
It is also known as Basic Reproduction Ratio (BRR).
It is defined as the number of new or secondary infections when a host computer is introduced
into the computer population. This parameter is useful because it helps determine whether or not
a computer virus will continue to spread through the computer population.
If #
>2 then the level of spread of computer virus increases and hence the number of
susceptible computers will decreases. When this happens, then there is a possible threat of a
computer virus epidemic.
On the other hand, if #< 1, then the level of computer virus rapidly die out and the number of
susceptible computers increase.
3.5.0 Exact Solution
According to Sae-jie, (2010) et al the analytical solution of the SIS model for all values of
parameters is still unknown, although numerical solutions can be obtained for any given
parameter values. Therefore qualitative analysis tends to be a useful tool. But Khan, Sadiq and
Shabbir et al (2010) has derived an exact solution to the SIS model.

c
c
c
c
The exact solution of systems could be obtained by converting the two equations into a Bernoulli
differential equation and solving it linearly. From equation (1)

* +9
:

* +

*"+
Put *"+ into *+ gives

m
m
9
5
* +

*%+

Dividing *%+ by * + to obtain

m
9
5

*'+

*'+ is a Bernoulli ordinary differential equation hence solved by letting such that

m

*(+

Putting *(+ into *'+ gives

m
*?+

The integrating factor of *?+ is . Multiply *?+ by the integrating factor gives
, where C is an integration constant.

*+

Substituting back into *+ the solution of the differential equation can be written as

*+

c
c
c
c
!
At
9
2 let *+
9

then
!
Now let <

9

m then the final solution is

( )

*+
ñ 0 ñ
ñ ( )
0

( ) :

*+
ñ 0 ñ
ñ ( )
0
where :
9
* +
;* +
3.6.0 Estimation of Parameters
3.6.1 5 and 4 Parameters
In this section, the method of estimating the parameters ț and ȕ are presented.
Dividing *+ by * +2gives
"#

9
@
4
;
$

*+

In order to estimate the parameters ț and ȕ, we use the method of simple linear regression model.
The difference equation version of the SIS Model given in (15) and (16) is used to estimate the
parameters.

m m % m $

*"+

$ m $ m % m $

*%+
, where ' the change is in time. Then setting

I"#"#&
From (15), we calculate
'
c
c
c
c

I"#"#&
and 9* +
and
variables are used as the predictor and response
'
variables of a simple linear regression model.
I"#"#&

9
@
4
;
$

(17)
'
The regression model is stated below
( (18)
Now comparing *'+ to *(+, 4

9

5 and 5
95
3.6.2 Reproduction Ratio, R0
The basic reproduction ratio, R0 is estimated using the formula

R0 =

3.6.3 Analysis of Variance
Table 1 shows how the parameters and will be calculated. F0 is the F-statistic.
cc cc
ANOVA TABLE
SOURCE OF PARAMETERS DEGREE OF SUM OF MEAN F0

VARIATION FREEDOM SQUARES SQUARES
REGRESSION ¼0 !
@

SSR MSR #
6
¼1
ERROR
@
!
SSE MSE
TOTAL
@

SST
c
c
c
c
Where:
p = Number of parameters to be estimated.
n = Total number of observations.

Sum of Squares of Regression (SSR) = * ) +, where Sxx = /#*, (*+ m #
/

Sum of Squares of Errors (SSE) =

) , where Syy =/#*, *+ m # and
/

/ /

Sxy =/#*, (* * m
#
Sum of Squares Total (SST) = SSE +SSR

--.
Mean Square of Regression (MSR) =
/
--0
Mean Square of Error (MSE) =
#/
Z-.

9

Z-0
-12
Sample Correlation Coefficient2

9
, -1 r 1
3-22 -11

-12
Coefficient of Determination,

9
- , 0 r2 1
22 -11
c
c
c
c
cc
DATA COLLECTION, ANALYSIS AND RESULT
4.0.0 Introduction
A statistical estimation of the parameters ț and ȕ will be given by fitting the data to a simple
linear regression model. Finally, we use Excel spreadsheet, MATLAB and R software to give a
comprehensive statistical and qualitative analysis of the data obtained.
4.1.0 Source of Data
During the case study, questionnaires were distributed to sixty computer users on KNUST
campus. The log history of each computer¶s antivirus software were recorded. The questionnaire
captured data on various infections, sources of infection, time of infection, action taken by the
scanner and users¶ frequent connectivity to the internet.
There were challenges encountered during the survey. Although we distributed sixty
questionnaires to sixty computer users on KNUST campus, 53 were accurately completed.
The Period, t Infected Computers, I(t) and Susceptible computers, S(t) of the data collected is
shown in the Appendix A on page 51.
4.2.0 Estimation of Parameters (ț, ȕ), And Reproductive Ratio (R0)
The simple linear regression model line given by

A
9
5
;
Í

*+
c
c
c
c
[ln ( ) ln ( 1)]
Where is the response variable and
9
* + is represents the predictor
V
variable as derived from the previous chapter.
Table 2 shows the calculated values of the predictor

*+ and response * + variables using R
software (version 2.9.2). The calculations were performed on a system running window 7
Ultimate edition with the hardware specifications:
(i)c Processor: Intel(R) Pentium (R) Dual CPU T3400 @ 2.16GHz 2.17GHz
(ii)cInstalled Memory(RAM): 2.00 GB (1.87GB usable)
(iii)System type: 32-bit Operating System
c
c
c
c
c
c
c
cc !c "#c
Predictor *+ Response * +

42 -2.3979
52 0
52 1.098612
50 -1.09861
52 1.098612
50 -0.40547
51 0
53 0
52 1.609438
48 -0.22314
49 -0.69315
51 0.693147
49 0
53 0
49 0.223144
48 -0.91629
51 0.693147
49 0.405465
47 -0.40547
49 0.559616
46 -0.15415
47 -0.69315
50 0.980829
45 -0.98083
50 0.847298
46 0
53 0
52 0.693147
51 0.405465
c
c
c
c
$cc%""c "c &c"'c !c "#c
The scatter plot of Table 2 in Figure 1 above justifies that simple linear regression modeling will
be appropriate to estimate the parameters of the SIS Model. Data from table 2 is fitted into the
linear regression equation (1) and the values of the parameters, ȕ0 and ȕ1 are estimated with the
aid of the R- software.
The result is displayed in table 3 below.
c
c
c
c
c
c(c !c"""%#c
ANOVA TABLE
SOURCE OF PARAMETERS DEGREE OF SUM OF MEAN F0

VARIATION FREEDOM SQUARES SQUARES
ȕW 0.26691 1 14.04721 14.04721 29.3014

REGRESSION ȕW -13.27087
ERROR ı+ 0.47941 27 12.94420 0.47941

TOTAL 28 26.99141
Hence the estimated regression model is

4
9
'('
;
%%?

*+
Hypotheses:
H0: Í = 0
H1: Í 0
The F-Statistic at Į = 0.05 significant level is given by
0.05(1,27 ) 4.21
Conclusion:
We fail to accept H0 since F0 = 29.3014 > 0.05(1,27 ) 4.21
Thus Í1 0, indicates that there is a linear relationship between the response and the predictor
variables and the predictor variable contribute significantly to the model.c
c
c
c
c
$cc) "$" !#c
The rejection and acceptance region of the F-Distribution is shown in figure 2, indicating
whether to accept or reject H0.
The Sample Correlation Coefficient,

= 0.72141, implies that 72.141% of the model has been
accounted for by the least square regression model.
The Coefficient of Determination, = 0.52043, also implies the proportion variability in the
response variable explained by the linear regression model is 52.043 %
Since from chapter 3 the regression equation is derived from (3) below,
ln ( ) ln ( 1)

()

*+

We compare *+ and *+ to get, 4

=13.27087 and 5 = 0.26691.
Now the basic reproduction ratio, R0 is then calculated as

0 . Hence # = 0.020112

c
c
c
c
4.3.0 Results
The exact solution to the SIS Model derived in chapter three is given as
ñ
( ) *+
ñ 0 ñ
ñ ( )
0

ñ
( )
*"+
ñ 0 ñ
ñ ( )
0
Where ³ Í 567 the variable have their usual meaning as defined in chapter three of this
work.
The estimated values of Í and are substituted into *+ and *"+ to obtain the average number of
Infected Computers, * + and average number of Susceptible Computers, * + as in Appendix B.
In Appendix B rows under the column with heading SIS Modeling Result represents the
modeling result for * + and

* + in *+ and *"+ respectively whereas rows under columns with
heading Data represents the data on the number of infected computer and susceptible computers
collected during the survey.
Figure 3 is the graph of the number of infected computers and susceptible computers. SIS
Infected and SIS Susceptible are line plots of the exact solutions in *+ and *"+ respectively.
Infected Data and Susceptible Data stands for line plots of the number of infected computers and
susceptible computers collected during the case study.
c
c
c
c
$c(cc *c+'#c
4.4.0 Qualitative Analysis of the SIS Model
The general SIS Model is

*%+
£ {

{ { {

*'+
£
c
c
c
c
4.4.1 Equilibrium Points
In order to investigate the properties of the dynamics of the model, we determine the equilibrium
points by considering that all the derivatives of population compartments vanish whenthis kind
of solution holds.
$ ( ) $ ( )
At equilibrium 0
and 0 and hence the equilibrium point is ( ( ), ( )) (0, ) c
$ $
4.4.2 Eigenvalues And Eigenvectors
The Jacobian of the (6) and (7) is
m $
=8 $ 9 :
m $ m
From
9
:
@

Evaluating B*2+ at the equilibrium point,
9
:
m since
9

;
m
=9 :
; m

With eigenvalues given by ° Í 567 ° ;< =he condition for asymptotic stability of
this point is °1 < 0. At : = 53, ° m;<>?@AB
The corresponding eigenvectors 567 + are also given as
C and + C

c
c
c
c
4.4.3 Deductions
In spite of one of the eigenvalues being zero the analysis of the stability can be conclusive
because ¬ . However, a negative eigenvalue suggest that the solution of the system is
asymptotically stable.
In Figure 4,
9
and
9
. The portrait is obtained using a MATLAB.
$c,c'c!#c
c
c
c
c
4.5.0 Discussions
From the calculations, the parameters are estimated to give ț = 13.27087 and ȕ = 0.26691. This
means that averagely, 13.27087 infected computers recover in each day and each infected
computer potentially contacts and infects an average of 0.26691 computers per day respectively.
Given the initial number of infected computers, =11, the estimated values of and are
substituted into *+ and *"+ to obtain the average numbers of Infected Computers, * + and
average number of Susceptible Computers, * + at every in the table at appendix B. It is noticed
that the percentage number of infected computers reduces to 6.19% of the computer population
at
9 29.
Now leaving the infection and recovery parameters the same but increasing the initial number of
infected computers to 20 and later decreasing to 8 initial numbers of infected computers, it can
be observed that the number of infected computers steadily decreases down to the steady level of
about 3.2796. This is the same as the steady level obtained when the initial number of infected
computers is 11. This implies that it doesn't matter what number of the infected computers we
start with, the steady state is determined by the infection and recovery parameters of the model.
We conclude that the steady state infection level is independent of the initial state.
At the steady state level, if the value of the recovery parameter, ț is increased to about 14.14623
computers per day then number of susceptible computers goes to 53 and the number infected
computers of the computer population goes to 0. Thus computer virus will die out.
Now looking at the effect of 5 at the steady state, increase 5 to 1 and later to 1.5 from the original
value 0.26691, we notice that as we increase 5 then averagely we are increasing the number of
c
c
c
c
infected computers to 40 and 44 respectively. Thus the computer virus spreads more quickly. As
we keep increasing the number of contacts we see that the number of infected computers of the
computer population continues to increase.
We deduce that as the number of infected computers decreases, there is a corresponding increase
in the number of susceptible computers as increases which is illustrated in figure 3. This can be
attributed to the fact that the basic reproduction ratio, R0 = 0.020112 is less than 1.
This implies that in long run there might not be an outbreak of computer virus among computer
users on KNUST campus. However at any given time, the sum of infected computers and
susceptible computers is equal to the number of computers in the population.
Although from the table in Appendix A, the average number of susceptible computers
outnumbers the average number of infected computers; this does not mean that the spread of
computer virus on KNUST campus will decline to extinction.
This can be attributed to the fact that student computer users on KNUST campus frequently
share electronic data among themselves through the use of USB pen drives and external hard
drive. Furthermore, these users on campus access the internet daily, probably to download media
files from unsecured websites.
c
c
c
c
cc
CONCLUSION AND RECOMMENDATIONS
5.0.0 Conclusion
We draw the following conclusions from the project in relation to the set targets. The spread of
computer virus depends on contacts that result in infections and recovery rate computer virus.
There will not an epidemic of computer virus since the computed R0 from the data collected is
less than 1. There can only be extinction of computer virus on KNUST campus if averagely 14
computers out of the 53 daily recover from infections. In reality, extinction of computer viruses
is quite impossible since students frequently share files among themselves through pen drives
and accessing unsecured websites on the internet.
5.1.0 Recommendations
Based on our findings, we observe that in order to control the spread of computer virus, the
following guidelines must be observed by all computer users.
(i)c All users on KNUST install antivirus software programs on their computers
(ii)cUsers should update their antivirus software regularly.
(iii)Users should scan immediately after update.
c
c
c
c
REFERENCES
1. Cohen, F.: Computer Viruses, Theory and Experiments, PhD thesis, University of Southern
California (1985)
2. System Dynamic Model for Computer virus Prevalence By Abdelazim and Wahba, 2010
3. Computer Virus Strategies and Detection Methods, by Daoud, Jebril and Zaqaibeh, 2008
4. Theory of Self-Reproducing Automata, by John Von Neumann, 1966
5. MS Encarta 09, Virus [computer], History
6. Mathematical Models on Interaction between Computer Virus and Antivirus Software
inside a Computer System, Bimal Kumar Mishra and Gholam Mursalin Ansari, Birla Institute of
Technology, 2008.
7. N. T. J. BAILEY, The Mathematical Theory of Infectious Diseases, 2nd ed., Hafner,
New York, 1975.
8. W. O. KERMACK AND A. G.MCKENDRICK, Contributions to the mathematical
theory of epidemics, part 1,Proc. Roy. Soc. London Ser. A, 115 (1927), pp. 700-721.
9. A. G. MCKENDRICK, of mathematics to Applications medical problems, Proc.
Edinburgh Math. Soc., 44 (1926), pp. 98-130
c
c
c
c
10. W. Murray (1988), The application of epidemiology to computer viruses. Computers &
Security Volume 7, issue 2, pages 139-150
12. Winfried Gleissner (1989). A mathematical theory for the spread of computer viruses.
Computers & Security Volume 8, issue 1,pages 35-41.
13. J. O. Kephart and S. R. White (1991) Directed-graph epidemiological models of
computer viruses. 1991 IEEE Computer Society Symposium on Research in Security and
Privacy, 343-359.
14. J. O. Kephart, S. R. White, and D. M. Chess (1993) Computers and epidemiology. IEEE
Spectrum 30, 20-26.
15. P.S. Tippett, "The Kinetics of Computer Virus Replication: A Theory and Preliminary
Survey," Safe Computing: Proceedings of the Fourth Annual Computer Virus and Security
Conference, New York, New York, March 14-15, 1991, pp. 66-87.
16. Computer Virus: A Global Perspective, Steve R. White, Jeffrey O. Kephart and David M.
Chess
17. Mathematical Modeling of Epidemics, Emma Harris, 2008.
18. Introduction to the Modeling of Epidemics ± SIS Models, Troy Tassier, 2005.
20. A note on exact solution of SIR and SIS epidemic models by G. Shabbir, H. Khan1 and
M. A. Sadiq (2010)
21. The History of Mathematics, John Tabak, PhD. 2004.
c
c
c
c
p p
DATA FROM KNUST, KUMASI
Period (t) Infected Computers, I(t) Susceptible Computers, S(t)

0 11 42
1 1 52
2 1 52
3 3 50
4 1 52
5 3 50
6 2 51
7 0 53
8 1 51
9 5 48
10 4 49
11 2 51
12 4 49
13 0 53
14 4 49
15 5 48
16 2 51
17 4 49
18 6 47
19 4 49
20 7 46
21 6 47
22 3 50
23 8 45
24 3 50
25 7 46
26 0 53
27 1 52
28 2 51
29 3 50

c
c
c
c
APPENDIX B:
SIS MODELING RESULT COMPARED TO THE RAW DATA
Data SIS Modeling Data

Period (t) Infected Susceptible Infected Susceptible Total (N)
Computers, I(t) Computers, Computers, I(t) Computers, S(t)
S(t)
0 11 42 11 42 53
1 1 52 4.635296884 48.36470312 53
2 1 52 3.734789001 49.265211 53
3 3 50 3.455081423 49.54491858 53
4 1 52 3.350516527 49.64948347 53
5 3 50 3.308788049 49.69121195 53
6 2 51 3.291704519 49.70829548 53
7 0 53 3.284637567 49.71536243 53
8 1 51 3.281701633 49.71829837 53
9 5 48 3.280479743 49.71952026 53
10 4 49 3.279970835 49.72002917 53
11 2 51 3.279758814 49.72024119 53
12 4 49 3.27967047 49.72032953 53
13 0 53 3.279633657 49.72036634 53
14 4 49 3.279618317 49.72038168 53
15 5 48 3.279611925 49.72038807 53
16 2 51 3.279609261 49.72039074 53
17 4 49 3.279608151 49.72039185 53
18 6 47 3.279607689 49.72039231 53
19 4 49 3.279607496 49.7203925 53
20 7 46 3.279607416 49.72039258 53
21 6 47 3.279607382 49.72039262 53
22 3 50 3.279607368 49.72039263 53
23 8 45 3.279607362 49.72039264 53
24 3 50 3.27960736 49.72039264 53
25 7 46 3.279607359 49.72039264 53
26 0 53 3.279607359 49.72039264 53
27 1 52 3.279607358 49.72039264 53
28 2 51 3.279607358 49.72039264 53
29 3 50 3.279607358 49.72039264 53
c
c

Final)

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Final)

Загружено:

Авторское право:

Доступные форматы

c

WE HEREBY DECLARE THAT THIS PROJECT WAS TRULY UNDERTAKEN BY US

GYAMFI ATTA KWAME (1551907) .«««««.. «««««..

NAME AND INDEX NUMBER SIGNATURE DATE

ABORABORAH NATHANIEL (1545407) «.««««.. «««««..

NAME AND INDEX NUMBER SIGNATURE DATE

BOAHEN FRANK (1549607) «««.««.. «««««..

NAME AND INDEX NUMBER SIGNATURE DATE

YERIBATUAH PETER (1558007) «««.««.. ««««««

NAME AND INDEX NUMBER SIGNATURE DATE

CERTIFIED BY: MR. K. F. DARKWAH «««««« ««««««

SUPERVISOR SIGNATURE DATE

CERTIFIED BY: DR. S.K. AMPONSAH «««««« ««««««

HEAD OF DEPARTMENT SIGNATURE DATE

We dedicate this project to our families especially Mom and Dad.

equations and regression analysis.

Table 1: ANOVA TABLE ........................................................................................................ 22c

Table 2: Regression Data. ......................................................................................................... 26c

required to install anti-malicious software programs. This intrusion is usually referred to as a

software program needs to be updated to regularly monitor malicious intrusions from

damage a computer system.

1.1.0 Malicious Software Programs

enter a computer system through attachments in e-mail messages, attachments of images,

virus, Trojan horse or worm.

performance of a computer system.

worms include: PSWBugbear.B, Lovgate.F, Trile.C, Sobig.D, Mapson.

personal information to be compromised.

1.2.0c Types of Computer Viruses

of infection and according to the mode of spread.

and application, file sharing media, disk etc.

1.2.1 File Infectors

virus, etc are some examples of file infectors.

1.2.2 Boot-Sector Virus

Michelangelo, FAT virus, etc.

1.2.3 Macro Virus

to infect other data files. Examples: Relax, Melissa.A, Bablas, O97M/Y2K.

1.2.4 Cavity or Spacefiller Virus

cavity viruses are rare. Daoud, Jebril and Zaqaibeh 

1.2.5 Resident Virus

MrKlunky and Meve.

1.2.6 Overwrite Virus

Way, Trj.Reboot, Trivial.88.D.

1.2.7 Compressing Virus

Jebril and Zaqaibeh, (2008).

1.2.8 Malicious Mobile Code (MMC)

reading HTML-formatted e-mail. Daoud, Jebril and Zaqaibeh (2008).

1.3.0 Background of Study

1.3.1 Historic Background of Computer Virus

been developed already.

and tried to propagate on an opponent¶s system.

host of increasingly complex viruses followed.

pornographic website, etc.

1.3.2 Profile of Students and Use of Computers on Knust Campus

programs that could frustrate their computing life.

could introduce computer virus into the campus.

1.4.0 Problem Statement

end of the project we should be able to:

(i)c state the factors contributing to the spread of computer virus.

(ii)cdetermine whether there will be an epidemic on KNUST campus.

(iii)determine if there is a possible computer virus extinction on KNUST campus.

linear regression model.

1.6.2 Data Type, Source and Period of Collection

1.6.3 Software Used

cavity viruses are rare. Daoud, Jebril and Zaqaibeh

* +9Total number of infected computers at time, t. Also known as infective class

respectively. An individual computer is said to be infected if it is in and is susceptible if in .

new infections will be given by * ;+

Put "+ into + gives

Dividing %+ by + to obtain

Putting (+ into '+ gives

, where C is an integration constant.

Dividing + by +2gives

, where ' the change is in time. Then setting

Now comparing '+ to (+, 4