AIS Model Tutorial

Artificial Immune System and Its Applications
Prof. Ying TAN

National Laboratory on Machine Perception Department of Intelligence Science Peking University, Beijing 100871, P.R.China
2005-12-13
Y. Tan---Artificial Immune Sys.
Contents
Biological Immune System Artificial Immune System Basic Algorithms of AIS AIS design procedure Case Studies
Malicious Executable Detection Film Recommender
New Immuneocomputing IC Danger Theory Future

2005-12-13 Y. Tan---Artificial Immune Sys. 2
The Immune System is

Immune system: a system that protects the body from foreign substances and pathogenic organisms by producing the immune response Immunity: state or quality of being resistant (immune), either by virtue of previous exposure (adaptive immunity) or as an inherited trait (innate immunity)
Why is the Immune System?

Immune system has following appealing features: Recognition
Anomaly detection Noise tolerance
Robustness Feature extraction Diversity Reinforcement learning Memory; Dynamically changing coverage Distributed Multi-layered Adaptive
Y. Tan---Artificial Immune Sys. 4
2005-12-13
Role of Biological Immune System

Protect our bodies from pathogen and viruses Primary immune response
Launch a response to invading pathogens
Secondary immune response

Remember past encounters Faster response the second time around
2005-12-13
Immune cells
There are two primarily types of lymphocytes:
B-lymphocytes (B cells) T-lymphocytes (T cells)
Others types include macrophages, phagocytic cells, cytokines, etc.
2005-12-13
Where is it?
P r im a r y l y m p h o i d o r g a n s S e c o n d a r y lym p h o id o r g a T o n s ils a n d a d e n o id s
T hym us S p le e n
P e y e r s p a t c h e s A p p e n d ix B o n e m a rro w Lym ph nodes L y m p h a tic v e s s e ls
2005-12-13
Multiple layers of the immune system

Pathogens
Skin Biochem ical barriers Phagocyte Innate im m une response
Lym phocytes
Adaptive im m une response

Antigen
Substances capable of starting a specific immune response commonly are referred to as antigens This includes some pathogens such as viruses, bacteria, fungi etc .
2005-12-13
Biological Immune System

Innate
vs
Acquired
Cell Mediated
vs
Humoral
T Cell (Helper) B Cell T Cell (Killer)

Secretes
Antibody
2005-12-13
How does IS work: A simplistic view

M H C A P C p r o te in A n tig e n ( I ) P e p tid e ( II ) T - c e ll ( II I ) B - c e ll ( V )
( IV A c tiv a te d T - c e ll
L y m p h o k in e s ( V I )
A c t iv a t e d ( p la s m a
B - c e ll c e ll)
( V II )
2005-12-13
11
Self/Non-Self Recognition
Immune system needs to be able to differentiate between self and non-self cells Antigenic encounters may result in cell death, therefore
Some kind of positive selection Some element of negative selection
2005-12-13
12
Immune Pattern Recognition

BCR or Antibody
B-cell Receptors (Ab) Epitopes Antigen

B-cell
2005-12-13
The immune recognition is based on the complementarity between the binding region of the receptor and a portion of the antigen called epitope. Antibodies present a single type of receptor, antigens might present several epitopes. This means that each antibody can recognize a single antigen
13
Clonal Selection
Clonal deletion (negative selection) Self-antigen Proliferation (Cloning)
M
Antibody Selection Differentiation
Memory cells
Plasma cells
Foreign antigens
Self-antigen Clonal deletion (negative selection)
2005-12-13
14
Main Properties of Clonal Selection (Burnet, 1978)

Elimination of self antigens Proliferation and differentiation on contact of mature lymphocytes with antigen Restriction of one pattern to one differentiated cell and retention of that pattern by clonal descendants; Generation of new random genetic changes, subsequently expressed as diverse antibody patterns by a form of accelerated somatic mutation
Immune Network Theory

Idiotypic network (Jerne, 1974) B cells co-stimulate each other Treat each other a bit like antigens Creates an immunological memory
Paratope Suppression Negative response A g 1 Idiotope Antibody Activation Positive response 2 3
2005-12-13
16
Reinforcement Learning and Immune Memory

Repeated exposure to an antigen throughout a lifetime Primary, secondary immune responses Remembers encounters
No need to start from scratch Memory cells
Continuous learning
Learning (2)
Primary Response Antibody Concentration Secondary Response Cross-Reactive Response
Lag Lag
Lag
Response to Ag1 Response to Ag1
...
Response to Ag2
Response to Ag1=Ag1 + Ag3
... ...
Antigen Ag1 Antigens Ag1, Ag2
...
Antigen Ag1 + Ag3 Time
2005-12-13
18
Back
Immune System: Summary
Define host (body cells) from external entities. When an entity is recognized as foreign (or dangerous)- activate several defense mechanisms leading to its destruction (or neutralization). Subsequent exposure to similar entity results in rapid immune response. Overall behavior of the immune system is an emergent property of many local interactions.
2005-12-13
19
Back
Immune metaphors
Other areas Idea! Idea
Immune System Artificial Immune Systems

What is an Artificial Immune System? Definition

Dasgupta99: Artificial immune systems (AIS) are intelligent and adaptive systems inspired by the immune system toward real-world problem solving
de Castro and Timmis: Artificial Immune Systems (AIS) are adaptive systems, inspired by theoretical immunology and observed immune functions, principles and models, which are applied to problem solving http://www.cs.kent.ac.uk/people/staff/jt6/aisbook/
Using natural immune system as a metaphor for solving complex computational problems. Not modelling the immune system
AI models and their corresponding natural prototypes

Natural prototype Biological level Natural language Brain nervous net Biological cells Molecules of proteins Genetic code
2005-12-13
AI model Formal logic Formal linguistic Neural computing (NC) Neural networks (NN) Cellular automata (CA) Artificial immune systems (AIS) Genetic Algorithms (GA)
22
Left hemisphere of brain Cells Cells Molecular Molecular
Some History
Developed from the field of theoretical immunology in the mid 1980s.
Suggested we might look at the IS
1990 Bersini first use of immune algorithms to solve problems Forrest et al Computer Security mid 1990s Hunt et al, mid 1990s Machine learning More
AIS Scope
Pattern recognition; Fault and anomaly detection; Data analysis; Data mining (classification/clustering) Agent-based systems; Scheduling; Machine-learning; Autonomous navigation and control; Search and optimization methods; Artificial life; Security of information systems; Optimization; Just to name a few.
2005-12-13
Back
Typical Applications of AIS
Computer Security(Forrest949698, Kephart94, Lamont9801,02, Dasgupta9901, Bentley0001,02) Anomaly Detection (Dasgupta960102) Fault Diagnosis (Ishida9293, Ishiguro94) Data Mining & Retrieval (Hunt9596, Timmis9901, 02) Pattern Recognition (Forrest93, Gibert94, de Castro 02) Adaptive Control (Bersini91) Job shop Scheduling (Hart98, 01, 02) Chemical Pattern Recognition (Dasgupta99) Robotics (Ishiguro9697,Singh01) Optimization (DeCastro99,Endo98, de Castro 02) Web Mining (Nasaroui02,Secker05) Fault Tolerance (Tyrrell, 01, 02, Timmis 02) Autonomous Systems (Varela92,Ishiguro96) Engineering Design Optimization (Hajela96 98, Nunes00)
Basic Immune Models and Algorithms

Bone Marrow Models Negative Selection Algorithms Clonal Selection Algorithm Immune Network Models Somatic Hypermutation
2005-12-13
26
Bone Marrow Models

Gene libraries are used to create antibodies from the bone marrow Antibody production through a random concatenation from gene libraries Simple or complex libraries
An individual genome corresponds to four libraries: Library 1 A1 A2 A3 A4 A5 A6 A7 A8 A3 Library 2 B1 B2 B3 B4 B5 B6 B7 B8 B2 Library 3 C1 C2 C3 C4 C5 C6 C7 C8 C8 Library 4 D1 D2 D3 D4 D5 D6 D7 D8 D5
A3
B2
C8
D5
= four 16 bit segments = a 64 bit chain
A3 B2 C8 D5 Expressed Ab molecule
2005-12-13
27
Negative Selection (NS) Algorithms

Forrest 1994: Idea taken from the negative selection of T-cells in the thymus Applied initially to computer security Split into two parts:
Censoring Monitoring
Self strings (S)
D e te c to r S e t (R )
Generate random strings (R0)
Match Yes Reject
No
Detector Set (R)
P ro te c te d S trin g s (S )
M a tc h Yes N o n -s e lf D e te c te d
No
Censoring 2005-12-13 Y. Tan---Artificial Immune Sys.
Monitoring 28
Clonal Selection Algorithm (de Castro & von Zuben, 2001)

1. Initialisation: Randomly initialise a population (P) 2. Antigenic Presentation: for each pattern in Ag, do:
2.1 Antigenic binding: determine affinity to each P 2.2 Affinity maturation: select n highest affinity from P and clone and mutate prop. to affinity with Ag, then add new mutants to P
3. Metadynamics: 3.1 select highest affinity P to form part of M 3.2 replace n number of random new ones 4. Cycle: repeat 2 and 3 until stopping criteria (e.g. Max Generation)
CLONALG for PR, Learning, Optimization
Agj Ab{d} Abj* Ab {r} Ab {m} fj Select Select Fj* Ab {n} Cj* Clone
L.N. de Castro, et.al., Learning and optimization using the clonal selection principle, IEEE Trans. Evolutionary computation, vol.6, no.3, June 2002, pp.239251
Select
Cj
30
2005-12-13
Discrete Immune Network Models (Timmis & Neal, 2001)

Initialisation: create an initial network from a sub-section of the antigens Antigenic presentation: for each antigenic pattern, do: 2.1 Clonal selection and network interactions: for each network cell, determine its stimulation level (based on antigenic and network interaction) 2.2 Metadynamics: eliminate network cells with a low stimulation 2.3 Clonal Expansion: select the most stimulated network cells and reproduce them proportionally to their stimulation 2.4 Somatic hypermutation: mutate each clone 2.5 Network construction: select mutated clones and integrate 3. Cycle: Repeat step 2 until termination condition is met 1. 2.
2005-12-13
31
Immune Network Models

Timmis & Neal, 2000 Used immune network theory as a basis, proposed the AINE algorithm
Initialize AIN For each antigen Present antigen to each ARB in the AIN Calculate ARB stimulation level Allocate B cells to ARBs, based on stimulation level Remove weakest ARBs (ones that do not hold any B cells) If termination condition met exit else Clone and mutate remaining ARBs Integrate new ARBs into AIN
Immune Network Models

De Castro & Von Zuben (2000c) aiNET, based in similar principles
At each iteration step do For each antigen do Determine affinity to all network cells Select n highest affinity network cells Clone these n selected cells Increase the affinity of the cells to antigen by reducing the distance between them (greedy search) Calculate improved affinity of these n cells Re-select a number of improved cells and place into matrix M Remove cells from M whose affinity is below a set threshold Calculate cell-cell affinity within the network Remove cells from network whose affinity is below a certain threshold Concatenate original network and M to form new network Determine whole network inter-cell affinities and remove all those below the set threshold Replace r% of worst individuals by novel randomly generated ones Test stopping criterion 2005-12-13 Y. Tan---Artificial Immune Sys. 33
Back
Somatic Hypermutation
Mutation rate in proportion to affinity Very controlled mutation in the natural immune system Trade-off between the normalized antibody affinity D* and its mutation rate ,
1 0 .9 0 .8 0 .7 0 .6
0 .5 0 .4 0 .3 0 .2 0 .1 0 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1
1 0 = 2 0
2005-12-13
D *
34
General Framework of AIS

Solution Immune Algorithms Affinity Measures Representation
Problem
2005-12-13
Application Domain
Representation Shape Space

Describe the general shape of a molecule
A n t ig e n
A n t ib o d y
Describe interactions between molecules Degree of binding between molecules
2005-12-13
36
Representation
Vectors Ab = Ab1, Ab2, ..., AbL Ag = Ag1, Ag2, ..., AgL Real-valued shape-space Integer shape-space Binary shape-space Symbolic shape-space
2005-12-13
Define their Interaction

Define the term Affinity Affinity is related to distance
Euclidian
D=
2 ( Ab Ag ) i i i =1
Other distance measures such as Hamming, Manhattan etc. etc. Affinity Threshold
2005-12-13
38
Shape Space Formalism

Repertoire of the immune system is complete (Perelson,
1989)
V
Extensive regions of complementarity Some threshold of recognition

2005-12-13 Y. Tan---Artificial Immune Sys.
39
Back
AIS Design
Problem description Deciding the immune principles used for problem solving Engineering the AIS
Defining the types of immune components used Defining the representation for the elements of the AIS Applying immune principle to problem solving The meta-dynamics of an AIS
Reverse mapping from AIS to the real problem

Back
Case Studies of AIS

From Z.H. Guo, Z.K. Liu, and Y. Tan, An NNbased Malicious Executables Detection Algorithm based on Immune Principles, F.Yin, J.Wang, C. Guo (Eds.): ISNN 2004, Springer, Lecture Notes in Computer Science 3174, pp. 675-680, 2004. (http://dblp.uni-trier.de)
Malicious Executables Detection ---
Film Recommender --- From Dr. Dr Uwe

Aickelin (http://www.aickelin.com), University of Nottingham, U.K. 2004
New!
Immuneocomputing -- IC
By Tarakanov, A. 2001. Aims of A proper mathematical framework; A new kind of computing; A new kind of hardware. New concepts of formal protein (FP) ------- vs. neuron formal immune networks (FIN)------- vs. NN
Refer to
2005-12-13
A.O. Tarakanov, V.A. skormin, and S.P. Sokolova, Immunocomputing: Principles and Applications, Springer, 2003.
Problems of Traditional Self/Non-self View

No reaction to foreign bacteria in gut (friendly bacteria). No reaction to food / air / etc. The human body changes over its life. Auto-immune diseases. How do we produce antibodies that react against antigens and yet avoid self? Is it necessary to attack all non-self or a specific self?
2005-12-13
43
New!
The Danger Theory
In the danger model, the idea is to recognise danger rather than non self. The screening is accomplished post production through an external danger signal. Thus the production of autoreactive antibodies (which react to self) is allowed. If an (e.g. autoreactive) antibody matches a stimulus in the absence of danger, it is removed. Thus harmless antigens are tolerated, and changing self accommodated.
Matzinger (2002). The Danger Model: A renewed sense of self , Science 296: 301-304.
Danger Theory (cont)

Danger Theory
Not self/non-self but Danger/Non-Danger Immune response is initiated in the tissues. Danger Zone. This makes it context dependant
Matzinger (2002) The Danger Model: A renewed sense of self Science 296: 301-304 Aickelin & Cayzer (2002) The Danger Theory and Its Application to Artificial Immune Systems, Proc. International Conference on AIS (ICARIS 2002)
Danger Zone
Stimulation Danger Zone Antibodies Antigens Cells Damaged Cell Danger Signal
Match, but too far No match away
Towards a dangerous IDS

The danger theory suggests that the immune system reacts to threats based on the correlation of various (danger) signals, providing a method of grounding the immune response, i.e. linking it directly to the attacker.
Aickelin U, Bentley P, Cayzer S, Kim J and McLeod J (2003): 'Danger Theory: The Link between AIS and IDS?', Proceedings ICARIS-2003, 2nd International Conference on Artificial Immune Systems, LNCS 2787, pp 147-155
Other ways of using danger

Danger = Crime, Antigen = Suspect
or...
Danger = Context ?
It could also be useful for data mining, where the danger signal is a proxy measure of interest Danger Zone can be spatial or temporal
Andrew Secker, Alex Freitas, and Jon Timmis (2005) Towards a danger theory inspired artificial immune system for web mining in A Scime, editor, Web Mining: applications and techniques, pages 145-168 (Idea Group)
Back
Some Recent Applications of Danger Theory
Anjum Iqbal, Mohd Aizaini Maarof, Danger Theory and Intelligent Data Processing, International Journal of Information Technology, Vol.1, No.1, 2004. Andrew Secker, Alex A. Freitas, and Jon Timmis, A Danger Thory Inspired Approach to Web Mining, Computing Lab. University of Kent, Canterbury, Kent, UK.2005 So on.
The Future
More formal approach required? Wide possible application domains. What makes the immune system unique? More work with immunologists:
Danger theory. Idiotypic Networks. Self-Assertion.
Reference for further reading

Books Artificial Immune Systems and Their Applications by Dipankar Dasgupta (Editor) Springer Verlag, January 1999. L.N. de Castro and J. Timmis, Artificial Immune Systems: A New Computational Intelligence Approach, Springer, 2002. A.O. Tarakanov, V.A. skormin, and S.P. Sokolova, Immunocomputing: Principles and Applications, Springer, 2003.
Related academic papers J. Timmis, P.Bentley, and Emma Hart (Eds.): Artificial Immune Systems, Proceedings of Second International Conference, ICARIS 2003, Edinburgh, UK, September 2003. LNCS 2787, Springer.
New Events:
Special Session on Artificial Immune Systems at the Congress on Evolutionary Computation (CEC), December 8-12, 2003, Canberra, Australia. Special Session on Immunity-Based Systems at Seventh International Conference on Knowledge-Based Intelligent Information & Engineering Systems (KES), September 3-5, 2003, University of Oxford, UK. Second International Conference on Artificial Immune Systems (ICARIS), September 1-3, 2003, Napier University, Edinburgh, UK. Tutorial on Artificial Immune Systems at 1st Multidisciplinary International Conference on Scheduling: Theory and Applications (MISTA), 12 August 2003, The University of Nottingham, UK. Tutorial on Immunological Computation at International Joint Conference on Artificial Intelligence (IJCAI), August 10, 2003, Acapulco, Mexico. Special Track on Artificial Immune Systems at Genetic and Evolutionary Computation Conference (GECCO), Chicago, USA, July 12-16, 2003
AIS Resources
Artificial Immune Systems and Their Applications by D Dasgupta (Editor), Springer Verlag, 1999. Artificial Immune Systems: A New Computational Intelligence Approach by L de Castro, J Timmis, Springer Verlag, 2002. Immunocomputing: Principles and Applications by A Tarakanov et al, Springer Verlag, 2003. Third International Conference on Artificial Immune Systems (ICARIS), September 13-16, 2004, University of Catania, Italy. 4th International Conference on Artificial Immune Systems(ICARIS), 14th-17th August, 2005 in Banff, Alberta, Canada
2005-12-13
53
First Page
Thats all
2005-12-13
54
Case Study 1:
Malicious Executables Detection based on Artificial Immune Principles*

From Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based Malicious Executables Detection Algorithm based on Immune Principles, F.Yin, J.Wang, C. Guo (Eds.): ISNN 2004, Springer Lecture Notes on Computer Science 3174, pp. 675-680, 2004. (http://dblp.uni-
trier.de)
* This work was supported by Natural Science Foundation

of China with Grant No. 60273100.
Outline
Definition of Terms Goal and Motivation Previous Research works Immune Principle for Malicious Executable Detection Malicious Executable Detection Algorithm Experiments and Discussion Concluding Remarks
Back
Definition of Terms
Malicious Executable is generally defined as a program that has some malicious functions, such as compromising a systems security, damaging a system or obtaining sensitive information without the permission of users. It includes virus, trojan horse, worm etc. Benign Executable is a normal program without any malicious function.
2005-12-13
57
tens of thousands of new viruses / year Appear!
But: Current antivirus systems
attempt to detect these new malicious programs with heuristics by hand (costly and ineffective)
Dos/Win32 viruses
Trojan horses
Computers / Information Systems
Worms Current Task: Devise new methods for detecting new ME

58
eMail attached viruses
Malicious executables
Back
Definition of Symbols and Structures

B: binary code alphabet, B={0,1}. Seq(s,k,l): short sequence cutting operation. Supposing s is binary sequence, and sb(0)b(1)b(n-1), b(i)B, then Seq(s,k,l)=b(k)b(k+1)b(k+l-1). E(k): executable set, k{m,b} m denotes malicious executable, b benign executable. E: whole set of executables, i.e., E= E(m)E(b). e(fj,n): executable as binary sequence of length n, and fj is executable identifier. ld: detector code length. lstep: step size of detector generation. dl: detector, dl = Seq(s,k,l). Dl: set of detector with code length l, i.e., Dl ={ dl (0), dl (1),, dl (nd-1)}, |Dl|= nd.
2005-12-13
59
Back
Goal and Motivation
Aiming at developing an automatic detection approach of new malicious executables. Aiming at trying to use artificial immune system (AIS) and artificial neural networks (ANN), to detect malicious executable with a high Detection Rate (DR) with low False Positive Rate (FPR) over others.
Back
Previous Related Works

Signature-based Methods Expert Knowledge-based Methods Machine Learning Methods
2005-12-13
61
Back
Signature-based Methods
It creates a unique tag for each malicious program so that future examples of it can be correctly classified with a small error rate. And relies on signatures of known malicious executable to generate detection models. Drawbacks: Can not detect unknown and mutated viruses. As increase of the number and type of viruses, its detection speed become slow dramatically. At the same time, the analysis of the signatures of viruses become very difficult, in particular, for the encrypted signatures. (refer to IBM Anti-virus Groups report: R.W. Lo, K.N. Levitt, and R.A. Olsson. MCF: a Malicious Code Filter. Computers & Security, 14(6):541566., 1995.)
Back
Expert Knowledge-based Methods

Using the knowledge of a group of virus experts to construct heuristic classifiers for detection of unknown viruses.
Drawbacks: Time-consuming analysis method. Only discover some unknown viruses, but its false detection rate is very high.
For detecting unknown virus based on ANN, IBM Anti-virus Group also proposes one method to detect Boot Sector viruses only.
(refer to W. Arnold and G. Tesauro. Automatically Generated Win32 Heuristic Virus Detection. Proceedings of the 2000 International Virus Bulletin Conference, 2000.)
Back
Machine Learning Methods
M.G. Schultz developed a framework that used data mining algorithms, i.e., Multi-Nave Bayes method, to train multiple classifiers on a set of malicious and benign executables to detect new examples (unknown ME).
(refer to M.G. Schultz.,E. Eskin and E. Zadok . Data Mining Methods for Detection of New Malicious Executables. IEEE Symposium on Security and Privacy, May 2001.)
2005-12-13
64
Biologically-motivated Information Processing Systems

Brain-nervous systems Neural Networks (NN) Genetic systems Genetic Algorithms(GA) Immune systems Artificial Immune Systems(AIS) or immunological computation. NN and GA have extensively studied with wide applications but AIS has relative few applications
2005-12-13
65
Natural prototypes vs. their models

Natural Biological prototype level Natural language Left hemisphere of brain Brain nervous Cells net Biological cells Cells Molecules of proteins Genetic code
2005-12-13
Computing model Formal logic Formal linguistic Artificial Neural networks (ANN) Cellular automata (CA) Artificial immune systems (AIS) Genetic Algorithms (GA)
66
Molecular Molecular
Comparison of Three Algorithms

GA (Optimisation) Components Location of Components Structure Chromosome Strings Dynamic Discrete Components Chromosome Strings Evolution Recruitment / Elimination of Components Crossover Fitness Function NN (Classification) Artificial Neurons Pre-Defined Networked Components Connection Strengths Learning Construction / Pruning of Connections Network Connections External Stimuli AIS Attribute Strings Dynamic Discrete components / Networked Components Component Concentration / Network Connections Evolution / Learning Recruitment / Elimination of Components Recognition / Network Connections Recognition / Objective Function
Knowledge Storage
Dynamics Meta-Dynamics Interaction between Components Interaction with Environment
2005-12-13
67
Back
Immune Principles for Malicious Executable Detection

Non-self Detection Principle Anomaly Detection Based on Thickness The Diversity of Detector Representation vs. Anomaly Detection Hole
Non-self Detection Principle

For natural immune system, all cells of body are categorized as two types of self and non-self. The immune process is to detect non-self from cells. To realize the non-self detection, the maturation process of lymphocytes T cell undergoes two selection stages of Positive Selection and Negative Selection since antigenic encounters may result in cell death. Some computer scientists inspired by these two stages had proposed some algorithms used to detect anomaly information. Here, we will use the Positive Selection Algorithm (PSA) to perform the non-self detection for recognizing the malicious executable.
Back
Non-self Detection by PSA

Detector Set Dl N Match ?
Short sequence to be detected Its length is l
Y
self non-self
Process of anomaly detection with PSA
2005-12-13
70
Back
Anomaly Detection Based on Thickness

Anomaly recognition process is one process that immune cells detect antigens and are activated. The activated threshold of immune cells is decided by the thickness of immune cells matching antigens.
2005-12-13
71
The Diversity of Detector Representation vs. Anomaly Detection Hole

The main difficulty of anomaly detection is utmost decreasing the anomaly detection hole. The natural immune system resolves this problem well by use of the diversity of MHC (Major Histocompatibility Complex) cell representations, which decides the diversity of anti-body touched in surface of T cells. This property is very useful in increasing the power of detecting mutated antigens, and decreasing the anomaly detection hole. According to the principle, we can use the diversity of detector representation to decrease the anomaly detection hole. As was illustrated by following schematic drawings.
Schematic diagram of abnormal detection holes (cont)

Self Space Abnormal detection holes Nonself Space
Detectors
2005-12-13
73
back
Reduction of abnormal detection holes by use of the diversity of detector representations
Detector Representation 1
Combination of detectors
Malicious Executable Detection Algorithm (MEDA)

MEDA based on AIS includes three parts, Detector generation, Anomaly information extraction , and Classification.
Back
Flow Chart of Malicious Executable Detection Algorithm (MEDA)

Gene (01101001) Generating detector set MEDA
Extracting property
anomaly
Classifier
Update Gene (10101101)
Executable to be detected (00111101)
Output
2005-12-13
76
Generation of Detector Set

Detector generation algorithm:
Begin initialize lstepldk=0 Do cutting e(fk,n) from Eg(b) i=0 While i <= n-ld-1 do Begin d = Seq(e(fk,n),i, ld) if d Dld then Dldd i=i+lstep End k=k+1 Until Eg(b) is empty Return Dld End
Back
Illustration of Detector Generating Process
File Hex Sequence: 56 32 12 0A 34 ED FF 00 2D. . 00 0A 34 ED FF FA 11 00 Extracting Detector: 56 32 12 32 12 0A 12 0A 34 FF FA 11 FA 11 00
Generating Process of 24-bit Detectors with 8-bit stepsize (ld=24, lstep=8)

Extraction of Anomaly Characteristics -Non-self Thickness (NST) Non-self Detection NST, as Anomaly Property, is defined as the ratio of number of non-self units to file binary sequence, pl=nn/(nn+ns). If there are m kinds of detectors, the file has a NST Vector P=(pl1, pl2, , plm)T.
2005-12-13
79
NST Extraction Diagram

Initializationchoose lstepld , Dl Nonself Detection File to be detected 00111101 Y
Is Nonself ? N ns add 1
nn add 1
Completing Y detection Compute pl=nn/(nn+ns)
End
Back
NST Extraction Algorithm

Begin open e(fk,n); Select lstep, ld; Set ns=0, nn=0, i=0; While i <= n-ld-1 do Begin s = Seq(e(fk,n),i, ld) if s Dld then nn = nn+1 else ns = ns + 1 i = i + lstep End pld = nn / ( nsnn ) Return pld End
BP Network Classifier
We use Anomaly Property Vector (APV), i.e., NST vector P, as input variable of two-layer BP network classifier. The number of nodes of input layer equals to APVs dimension. The Sigmoid transfer function is chosen for the hidden layer and Linear function for the output layer.
Back
BP Network Classifier Structure

Non-Self Thickness (NST) Vector
pl1 pl2
P
plm
Out (1-ME, 0-BE)
2005-12-13
83
Back
Experiments and Discussion

Experimental Data Set Generation of Detector Set Experimental Result Using Single Detector Set Experimental Result Using Multi-Detector Set
2005-12-13
84
Back
Experimental Data Set

Type BE ME Total Files 915 Remarks
Win 2K OS and some application programs. Worm, etc. from Internet. All Justified by Antivirus cleaner Tools
3566 DOS virus, Win32 virus, Trojan, 4481
BEBenign Executable MEMalicious Executable

Back
Generation of Detector Set
Eg(b) is Gene of generating detector, ld {16 24326496}, and lstep=8bits. By using the detector generating algorithm, we can get D16, D24, D32, D64, and D96, separately.
Table1: Detectors generation result
Code Length ld |Dld| store structure
2005-12-13
16 65536
24
32
64
96
10,931,62 8,938,35 7 2 Tree
12,768,36 21,294,85 1 7 Tree Tree
Bitmap Bitmap Index Index
86
Detection Result of Malicious Executables by D24

(Detection Rate)%
P24
NST p24 File No. (a) NST of files, where symbol x represents benign program (Red), malicious program (Blue)
2005-12-13
(False Positive Rate)%
(b) ROC Curve
87

(Detection Rate)%
P32
NST p32
(a) NST of files, where symbol x represents benign program, malicious program
(b) ROC Curve
88

(Detection Rate)%
P64
2005-12-13
NST p64
(a) NST of files, where symbol x represents benign program (Red), malicious program (Blue)
(b) ROC Curve
89
Experimental Result Using Single Detector Set

100 80 60 40 20 0 0 20 40 60 80 100 1 6 b its 2 4 b its 3 2 b its 6 4 b its 9 6 b its D a ta D a ta D a ta D a ta D a ta Set Set Set Set Set
Detection Rate (%)
False Positive Rate (%)

Back
When FPR is fixed, relationship curves of DR versus Code Length ld

Detection Rate (%)
C o d e le n g th l d b its
2005-12-13
Note: from the bottom to up, the FPR Y. Tan---Artificial Immune Sys. is 0%, 0.5%, 1%, 2%, 4%, 8%, and 16%, in sequence.
91
Experimental Result Using MultiDetector Set

This experiment selects multi-detector set to detect benign and malicious executables. We dont use D16 because of its zero DR and also set D96 as upper limit because almost same DR values when ld 96. Here we selects D24, D32, D64 and D96 four detector sets as anomaly detection data set, and uses them to extract Non-self thickness (NST) vector, and finally a BP network is exploited as classifier. For the process of classification, we randomly selects 30% files of E(b) as Eg(b) to train a BP network, and use the remaining data to illustrate the anomaly detection performance.
NST Distribution and ROC Curve of Multi-Detector Set Method

64bits
24
bit
32bits
Detection Rate (%)
(a) NST of files for mixture of D24, D32 and D64. x benign program (in Red), malicious program (in Blue).
2005-12-13
(b) ROC Curve of mixed detector set of D24, D32, D64 and D96
93
Comparisons With Bayes Methods and Signature-based Method

100 80
Detection Rate (%)
60 40 20 0 0 2
M E D A w ith B P N e tw o r k N a ive B a ye s w ith S tr in g s M u lti- N a ive B a ye s w ith B yte s S ig n a tu r e M e th o d
10
12

Back
Algorithm Complexities
Operation type 1 Algorithm Name detectors Prob. Info. Amount ltrain >>ltrain Operation type 2 Name detector matching Searching P(Fi/C) Amount 80ltes
t
Operation type 3 Name Computing NST Amount 4lf additions
Store Space
MEDA Bayes
0.4Gb 1Gb
Depend Computing lf float on P(Fi/C) Joint Probs. multiplicaP (C ) P ( F / C ) tions

n i =1 i
2005-12-13
95
Remarks
Back
For short binary sequence and single detector set for the detection of malicious executables, the performance of D24 is the best, giving out DR 80.6% with FPR 3%. For long code length of detector and multidetector set, our method obtains the best performance of DR 97.46% with FPR 2%, over current methods. This result verifies
diversity of detector representation can decrease anomaly detection holes. non-self thickness detection.
Back
96
Case Study 2:
Film Recommender
From Dr. Dr Uwe Aickelin (http://www.aickelin.com) University of Nottingham, U.K.,
z
Prediction:
What rating would I give a specific film?
Recommendation:
Give me a top 10 list of films I might like.
2005-12-13
97
Film Recommender (cont 1)

EachMovie database (70k users). User Profile: set of tuples {movie, rating}. Me: My user profile. Neighbour: User profile of others. Similarity metric: Correlation score. Neighbourhood: Group of similar users. Recommendations: From neighbourhood.
2005-12-13

Antigen Antibody
at io
User Profile: set of tuples {movie, rating} Me: My user profile. Neighbour: User profile of others. Affinity metric: Correlation score.
Su pp re ss io
ul im
Antibody Antigen Binding Antibody Antibody Binding
St
Neighbourhood: Group of similar users.
Group of antibodies similar to antigen and dissimilar to other antibodies Weighted Score based on Similarities.
Recommendations: From neighbourhood

99

Start with empty AIS. Encode target user as an antigen Ag. WHILE (AIS not full) && (More Users):
Add next user as antibody Ab. IF (AIS at full size) Iterate AIS.
Generate recommendations from AIS.
2005-12-13
100

Suppose we have 5 users and 4 movies:
u1={(m1,v11),(m2,v12),(m3,v13)}. u2={(m1,v21),(m2,v22),(m3,v23),(m4,v24)}. u3={(m1,v31),(m2,v32),(m4,v34)}. u4={(m1,v41),(m4,v44)}. u5={(m1,v51),(m2,v52),(m3,v53), (m4,v54)}.
We do not have users votes for every film. We want to predict the vote of user u4 on movie m3.
Algorithm walkthrough (1)

Start with empty AIS:
DATABASE u1, u2, u3, u4, u5 AIS
User for whom to predict becomes antigen:

DATABASE u1, u2, u3, u5
AIS
u4
Ag
102

Add antibodies until AIS is full
AIS DATABASE u2, u3, u5 u1 Ag Ab1 AIS DATABASE u4
u2,u3 Ab1
Ag Ab2 Ab3
103

Table of Correlation between Ab and Ag:
Ab3 Ab1 Ag Ab2
MS14, MS24, MS34.
Table of Correlation between Antibodies:

MS12 = CorrelCoef(Ab1, Ab2) MS13 = CorrelCoef(Ab1, Ab3) MS23 = CorrelCoef(Ab2, Ab3)
2005-12-13

Calculate Concentration of each Ab:
Interaction with Ag (Stimulation). Interaction with other Ab (Suppression).
AIS Ag Ab1 Ab2 Ab3 Ab1 Ag Ab2 AIS
Ab2 Ab 2 Ab1 Ab2 Ab2
2005-12-13
105

Generate Recommendation based on Antibody Concentration.
AIS Ag Ab1 Ab2 Ab2 Ab 2 Ab1 Ab2 Ab2
Recommendation for user u4 on movie m3 will be highly based on vote on m3 of user u2
2005-12-13
106
Film Recommender Results

Tested against standard method (Pearson k-nearest neighbours). Prediction:
Results of same quality.
Recommendation:
4 out of 5 films correct (AIS). 3 out of 5 films correct (Pearson). Back

AIS Model Tutorial

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

AIS Model Tutorial

Загружено:

Авторское право:

Доступные форматы

Artificial Immune System and Its Applications

Prof. Ying TAN

Y. Tan---Artificial Immune Sys.

New Immuneocomputing IC Danger Theory Future

The Immune System is

Why is the Immune System?

Role of Biological Immune System

Secondary immune response

Y. Tan---Artificial Immune Sys.

Others types include macrophages, phagocytic cells, cytokines, etc.

Y. Tan---Artificial Immune Sys.

P e y e r s p a t c h e s A p p e n d ix B o n e m a rro w Lym ph nodes L y m p h a tic v e s s e ls

Y. Tan---Artificial Immune Sys.

Multiple layers of the immune system

Skin Biochem ical barriers Phagocyte Innate im m une response

Adaptive im m une response

Y. Tan---Artificial Immune Sys.

Biological Immune System

T Cell (Helper) B Cell T Cell (Killer)

How does IS work: A simplistic view

Y. Tan---Artificial Immune Sys.

Y. Tan---Artificial Immune Sys.

Immune Pattern Recognition

B-cell Receptors (Ab) Epitopes Antigen

Antibody Selection Differentiation

Self-antigen Clonal deletion (negative selection)

Y. Tan---Artificial Immune Sys.

Main Properties of Clonal Selection (Burnet, 1978)

Immune Network Theory

Y. Tan---Artificial Immune Sys.

Reinforcement Learning and Immune Memory

Response to Ag1 Response to Ag1

Response to Ag1=Ag1 + Ag3

Y. Tan---Artificial Immune Sys.

Immune System: Summary

Y. Tan---Artificial Immune Sys.

Immune System Artificial Immune Systems

What is an Artificial Immune System? Definition

AI models and their corresponding natural prototypes

Left hemisphere of brain Cells Cells Molecular Molecular

Y. Tan---Artificial Immune Sys.

Typical Applications of AIS

Basic Immune Models and Algorithms

Y. Tan---Artificial Immune Sys.

Bone Marrow Models

= four 16 bit segments = a 64 bit chain

Y. Tan---Artificial Immune Sys.

Negative Selection (NS) Algorithms

Generate random strings (R0)

Match Yes Reject

Detector Set (R)

Censoring 2005-12-13 Y. Tan---Artificial Immune Sys.

Clonal Selection Algorithm (de Castro & von Zuben, 2001)

CLONALG for PR, Learning, Optimization

Discrete Immune Network Models (Timmis & Neal, 2001)

Y. Tan---Artificial Immune Sys.

Immune Network Models

Immune Network Models

Y. Tan---Artificial Immune Sys.

General Framework of AIS

Representation Shape Space

Describe interactions between molecules Degree of binding between molecules

Y. Tan---Artificial Immune Sys.

Define their Interaction