Computer Communications: Li Ming Chen, Meng Chang Chen, Wanjiun Liao, Yeali S. Sun

Computer Communications 36 (2013) 1471–1484
Contents lists available at SciVerse ScienceDirect
Computer Communications
journal homepage: www.elsevier.com/locate/comcom
A scalable network forensics mechanism for stealthy self-propagating

attacks
Li Ming Chen a,⇑, Meng Chang Chen b, Wanjiun Liao a, Yeali S. Sun c
a
Department of Electrical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan
b
Institute of Information Science, Academia Sinica, No. 128, Sec. 2, Academia Rd., Nankang, Taipei 115, Taiwan
c
Department of Information Management, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan
a r t i c l e i n f o a b s t r a c t
Article history: Network forensics supports capabilities such as attacker identification and attack reconstruction, which
Received 23 August 2012 complement the traditional intrusion detection and perimeter defense techniques in building a robust
Received in revised form 10 May 2013 security mechanism. Attacker identification pinpoints attack origin to deter future attackers, while attack
Accepted 11 May 2013
reconstruction reveals attack causality and network vulnerabilities. In this paper, we discuss the problem
Available online 30 May 2013
and feasibility of back tracking the origin of a self-propagating stealth attack when given a network traffic
trace for a sufficiently long period of time. We propose a network forensics mechanism that is scalable in
Keywords:
computation time and space while maintaining high accuracy in the identification of the attack origin.
Network forensics
Data reduction
We further develop a data reduction method to filter out attack-irrelevant data and only retain evidence
Stealthy self-propagating attack relevant to potential attacks for a post-mortem investigation. Using real-world trace driven experiments,
Contact activity we evaluate the performance of the proposed mechanism and show that we can trim down up to 97% of
attack-irrelevant network traffic and successfully identify attack origin.
Ó 2013 Elsevier B.V. All rights reserved.
1. Introduction rate of false positives. In addition, the lifespan of a stealth attack

might last for so long that applying forensic investigation on such
Advanced Internet attacks tend to adopt indirect attack tech- an extend time scale becomes even more challenging. In this paper,
niques to conceal attack origins and provide attackers with ano- we discuss the problem and feasibility of back tracking the origin
nymity from security analysts. A self-propagating malcode [1], of a self-propagating stealth attack when given a network traffic
for example, usually probes and compromises remote targets on trace for a sufficiently long period of time. The challenges of such
vulnerable services and replicates itself to the remote targets to a long-term network forensics cover three aspects:
push the attack forward; thus, after several rounds of propagation,
the attack origin is indistinguishable from the victims that are in- Storage: Enormous amounts of audit data generated over a long
fected by other victims. Although current intrusion detection tech- period need to be retained for forensic investigation.
niques [2–4] can support certain types of perimeter protection, Computation: Forensic investigation is computationally inten-
they cannot provide effective deterrence by identifying the attack sive. Correlating and identifying suspicious events from a huge
origin. However, network forensics supporting capabilities such dataset is time consuming.
as attacker identification and attack reconstruction can comple- Accuracy: Attack traffic usually blends in with other daily Inter-
ment traditional intrusion detection techniques in building a ro- net traffic. Large amounts of irrelevant data during an investiga-
bust security mechanism. tion imply more noise, and thus lower accuracy.
Stealth attacks specially crafted to evade intrusion detection
techniques may aggravate the security risks. In contrast with tradi- To address these challenges, we propose a network forensics
tional Internet attacks, a stealth attack may use a variable rate to mechanism that is scalable in computation time and space while
scan victims and/or use polymorphism techniques to elude signa- maintaining high accuracy in the identification of the attack origin.
ture-based intrusion detection systems. This type of attack can re- We develop a data reduction method based on host contact activ-
main undetectable indefinitely or can only be detected with a high ities to filter out attack-irrelevant data and only retain evidence
relevant to potential attacks for post-mortem investigation. This
⇑ Corresponding author. Tel.: +886 2 27883799x1376; fax: +886 2 26518661. strategy is premised on the ability to profile normal user behavior,
E-mail addresses: d95921021@ntu.edu.tw (L.M. Chen), mcc@iis.sinica.edu.tw
and the condition that an infected host must contact previously
(M.C. Chen), wjliao@cc.ee.ntu.edu.tw (W. Liao), sunny@im.ntu.edu.tw (Y.S. Sun). uncontacted hosts to push the attack forward. After performing
0140-3664/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.comcom.2013.05.005
1472 L.M. Chen et al. / Computer Communications 36 (2013) 1471–1484
the data reduction, we analyze and correlate the retained traffic to three classes. An attack edge represents a connection that carries
pinpoint the attack origin. an infectious payload to the receiver. If the receiver has the corre-
In this study, we collect real-world traffic traces from a campus- sponding vulnerability, it will be compromised and its state will
wide network at two periods of time in 2006 and 2011. We inject switch to ‘‘infected;’’ otherwise, its state remains as ‘‘uninfected.’’
synthetic worm traffic into real-world traffic traces to simulate a A causal edge is an edge through which the corresponding connec-
slow self-propagating attack and we evaluate the performance of tion in fact infects its receiver and advances the attack forward.
data reduction by classifying traffic in the combined traces. For Causal edges are subsets of attack edges. The rest of the edges in
the forensic investigation, we adopt the random moonwalk the graph that represent legitimate connections are defined as nor-
(RMW) [5] as the forensic algorithm. This algorithm has been dem- mal edges.
onstrated to be effective against worm attacks on a short term trace, The design concept of the RMW is based on the one invariant
unlike the basic algorithm which cannot deal with long-term across all epidemic style attacks: in the host contact graph, causal
network forensics. We also evaluate the accuracy of the RMW and edges form a causal tree that depicts the proliferation structure of
compare the forensic results with and without the use of data an epidemic style attack and is rooted at the source of the attack.
reduction. The results show that different traces have different In order to pinpoint the origin of the epidemic, the goal of the
characteristics that result in different outcomes for data reduction. RMW is to identify a set of initial causal edges at the top level of
However, on average the proposed data reduction method not only the causal tree. Fig. 1 lists the steps of the RMW algorithm. By
alleviates the overwhelming storage and processing demands, but repeatedly sampling moonwalk paths backward in time, it is ex-
also improves the forensic accuracy in long-term network forensics. pected that the algorithm will converge over the causal tree. The
The contributions of our work can be summarized in the follow- underlying intuition is that in the tree-like structure of an epidemic
ing points: style attack, a small number of causal edges at the higher level of
the causal tree generate an exponential order of lower level causal
We develop a scalable network forensics mechanism that edges further down the tree. Therefore, after a sufficient number of
reduces up to 97% of attack-irrelevant network traffic, moonwalk paths have been performed on the host contact graph,
which leads to higher accuracy and lower overhead in the the initial causal edges will emerge as the edges with large counts.
forensic investigation for self-propagating stealth attacks.
The proposed data reduction method allows users to spec- 2.2. The challenge of RMW in long-term forensics
ify an expected false positive rate to guarantee the quality
of the forensic investigation. The RMW algorithm is then According to the algorithm, parameters W; d, and Dt determine
applied to the reduced traffic trace to identify the attack the performance of the algorithm. This section summarizes their
origin. effects for attack origin identification and discusses the potential
The proposed data reduction method is solely based on problems that the RMW may encounter when dealing with a slow
identifying deviations of host contact behavior, such that propagation attack.
the use of intrusion evasion techniques (e.g., encryption, The parameter W represents the number of trials of sampled
mutation, and special target acquisition) in an attack can moonwalk paths on a host contact graph. For analyzing an attack,
be resolved. it is necessary that the value of W is large enough to make the path
We apply the proposed mechanism to two sets of real- sampling converge at the higher level edges of the causal tree. Fur-
world traffic traces with different network connection ther increasing W will not improve the apparent accuracy, but will
behaviors, with the successful performance of data reduc- instead increase the overall execution time of the RMW. For a slow
tion and attack origin identification showing the robust- propagation attack, the amount of attack edges may become rela-
ness of the proposed mechanism. tively smaller than the amount of normal edges; hence a moon-
walk path has a smaller probability to reach the root of the
The remainder of this paper is organized as follows. We review causal tree. It is difficult to predict a suitable value for W to launch
the RMW algorithm and related work in Section 2. We present the RMW back tracking.
motivation and our approach, as well as the system architecture in The parameter d, which restricts the length of a moonwalk path,
Section 3. The concept of data reduction and the development of and Dt, which defines the longest interval of two consecutive edges
traffic filters are described in Section 4. Section 5 presents the de- on a moonwalk path, have a certain correlation with respect to the
tails of the experiment methodology and Section 6 discusses the accuracy of the RMW. In [5], Xie et al. suggest using an adaptive ap-
evaluation results. We discuss limitations and some practical is- proach to fine-tune these two parameters. By choosing a Dt that
sues in Section 7 and conclude the paper in Section 8. gives the longest actual path lengths (a hint for configuring d),
the algorithm will obtain the best sampling performance. It is clear
that the value of Dt must be sufficiently large to associate the at-
2. Background
tack edges generated by an infected host to the specific causal edge
of that host. A further increase in Dt negatively impacts the RMW,
In this section, we first introduce the RMW algorithm and dis-
because each moonwalk path tends to be shorter by jumping
cuss its advantages and drawbacks in dealing with the problem
across a larger portion of the trace for every moonwalk step. For
of long-term network forensics. We then discuss related works
long-term network forensics, however, we must set Dt sufficiently
for this study.
large to cover the behavior of an infected host before and after its
infection. This introduces another problem with such configura-
2.1. Overview of the RMW algorithm tion: the set of candidate edges of each moonwalk step might be
dominated by normal edges, and thus the RMW algorithm will
The input of the RMW algorithm is a directed host contact graph be difficult to converge on the attack origin.
that records network communications between end-hosts through
time. Each node in the graph represents the state of an end-host at 2.3. Related work
a specific time. Each directed edge represents a connection be-
tween two communication peers, pointing from the initiator to To date, while there have been a few approaches that support
the receiver. In the host contact graph, edges are categorized into the detection of slow propagation worms [6–9], none of them
L.M. Chen et al. / Computer Communications 36 (2013) 1471–1484 1473
Fig. 1. Random moonwalk algorithm. An edge is denoted as a tuple ei ¼ fui ; v i ; t si ; tei g where ui is the initiator, v i is the receiver, and tsi and tei are the start and end times of the
communication.
provide the capability of network forensics. Further, Stanfford et al. reduction and utilizes synopsis techniques to capture network
[10] compared different behavior-based worm detectors and found events in a succinct way. Wang et al. [16] use an evidence graph
that a stealth worm could evade all the evaluated detectors in all model to preserve network events and automated reason attack
environments. causality based on a hierarchical reasoning framework. Their ap-
In [5], Xie et al. first discovered the problem of worm origin proach identifies entities of an attack and discerns the complete at-
identification by inferring correlations among network hosts from tack scenario among those entities, while our work especially
their communication patterns. They exploited the difference be- focuses on addressing the problem of long-term network forensics.
tween worm propagation and common client–server communica- Liao et al. [18] propose an effective and automated network foren-
tion model and showed the effectiveness of the proposed RMW sic system based on fuzzy logic and expert system. Anaya et al. [19]
algorithm. However, their work did not discuss the scalability is- further use fuzzy logic and an artificial neural network to detect
sue, which is addressed in this paper. We propose a pre-filtering suspicious network flows and to address the challenges of enor-
step for data reduction and demonstrate the performance improve- mous data being logged for network forensic computing. Our ap-
ments. Besides the RMW, there has been some work focusing on proach differs from this work because our data reduction method
tracking worm origins. For example, Kumar et al. [11] reverse engi- and forensic investigation are both based on host contact activities
neered the pseudorandom number sequence used by the Witty which we think is the feature that is most difficult to conceal in a
Worm and reconstructed the infection tree. This approach requires worm infection. Other network forensic models or frameworks can
disassembly of the worm’s binary code and can be thwarted by be found in [20].
more determined attackers. Based on a network telescope log data, Generally, data compression is not suitable for forensic investi-
Rajab et al. [12] and Hamadeh et al. [13]proposed inferring the ini- gation due to (1) the decompression overhead before analysis, and
tial infection sequence (or the initial infection tree) of a worm at- (2) the actual amount of data for analysis not being reduced. Some
tack. These approaches are more suitable for analyzing random detection techniques use a Bloom filter [21] to construct a small
scanning worms, and their performances are usually affected by size bit vector to index previously seen events and support mem-
the size of the network telescope. Xiang et al. [14] improved the bership queries. This kind of abstracted information cannot be na-
RMW algorithm to support online attack reconstruction. However, ively adopted for forensic investigation, except in cases where we
it cannot deal with stealth attacks. Wang et al.[15], applying prob- know what to query in advance. Although sampling techniques are
abilistic modeling methods and a sequential growth model to ana- widely used in conjunction with logging systems to reduce the
lyze the infection tree of a wide class of worms, demonstrated that amount of data to be examined for network measurement, [22]
a general worm infection tree is highly unbalanced. It also give us a points out that these techniques are insufficient for anomaly detec-
hint that directly applying the RMW algorithm on a raw traffic tion and our experiments also show that random sampling is
trace might be insufficient. unacceptable for preserving anomalous events for forensic
Network Forensics is an important extension in network secu- investigation.
rity. In [16], Wang et al. summarized two major technical chal- Nonetheless, there is also some related work focusing on reduc-
lenges in network forensics. The first is that forensic analysts are ing the data volume before attack detection or forensic analysis.
overwhelmed by huge volumes of low-quality evidence, and the Staniford et al. [23] use Bayesian networks to infer event likelihood
second is that Cyber attacks are becoming increasingly sophisti- and only keep the anomalous packets for stealthy portscans detec-
cated. For these reasons, how to identify useful network events tion. Bailey et al. [24] focus on scalable monitoring of darknets and
and record minimum representative attributes for each event for reduce the amount of data for forensic honeypots by using source-
a long-term network forensics is critical. ForNet [17] supports dis- distribution based methods. Maier et al. [25] present a Time Ma-
tributed and efficient network logging that can be deployed in a chine to efficiently retain several days of packet-level network traf-
wide area network to aid network forensics. It also focuses on data fic by only storing up to a cut-off limit of bytes per connection. Our
approach, in contrast, focuses on retaining and investigating con-

nection-level network traffic. Finally, Giura et al. [26] propose a
new column-oriented storage infrastructure for the sheer volume
of network flow records. Their approach can speed up the query
and reduce the storage size compared to traditional row-oriented
relational database.
3. Our approach
Considering the features of the RMW algorithm, we propose

using data reduction as a preprocessing step to assist the following
RMW process. In this section, we present the motivation, approach,
Fig. 2. Schematic representation of a scalable network forensics mechanism.
and architecture of our proposed mechanism.
3.1. Motivation
the historical traffic trace (described later in Section 4.2).
Given a network traffic trace that contains network-wide con- Accordingly, we assess the anomalousness of a connection C based
nection information for a sufficiently long period of time, we dis- on the degree of confidence that we expect C to have occurred pre-
cuss the problem and the feasibility of back tracking the origin of viously. The goal of data reduction is to keep the amount of the re-
a stealth self-propagating attack. When an attack has occurred, corded traffic as low as possible, while preserving as much of the
the audit traffic trace contains both the attack traffic, which pre- attack traffic as possible.
serves evidence for detecting and inspecting the root cause of the
attack, and the normal traffic, which is attack irrelevant. Instead
3.3. System architecture
of developing a smart guidance method for origin tracking, we
alternatively propose data reduction measures to assist forensic
Fig. 2 illustrates our scalable network forensics mechanism that
investigation. The purpose of the concept is to filter out attack-
consists of the three phases of training, logging (data reduction),
irrelevant data and only apply the RMW algorithm to the rest of
and investigation.
the data for subsequent analysis. The motivation for this is twofold,
In the training phase, we use a set of historical traffic traces as
given that (1) the normal traffic which is generated by legitimate
the training data and from this build a normal behavior profile. Once
users/applications is not called for in network forensics, and (2)
a normal behavior profile is established, it becomes the main refer-
the RMW can easily be affected by attack-irrelevant connections,
ence for making decisions in the logging phase. When logging, real-
either in terms of speed or accuracy, when dealing with long-term
time traffic is fed into our system. A normal traffic filter (denoted as
network forensics. We believe the proposed data reduction method
U) will query the learned normal behavior profile, retrieve the re-
can facilitate the RMW in building a more robust network forensics
quired information from the profile, and compute the anomaly
mechanism.
score for each connection to check whether the observed commu-
nications are ‘‘normal’’ or not. Connections whose anomaly scores
3.2. Approach are higher than h will be regarded as unexpected and stored to the
reduced traffic trace. Finally, in the investigation phase, we apply
We design a data reduction method to divide the input connec- the RMW to the reduced traffic trace to identify possible epidemic
tions into two subsets, one containing normal connections and the attacks and their origins.
other containing suspicious connections. Note that only the suspi- In order to support our learning-based data reduction method,
cious connections are regarded as the input of the RMW algorithm. we separate the collected real-world traffic trace into the two parts
In this way, the problem of data reduction is defined as follows: be- of training and testing. Due to the lack of labeled traffic traces, we
fore the forensic investigation, how do we effectively and effi- deem the real-world traffic as the traffic generated by normal, non-
ciently handle the sheer volume of network traffic to preserve infected hosts. Note that Fig. 2 also depicts the opportunities of
the key attack-related evidential connections for further analysis? using a background noise filter (denoted by W) that works as a bin-
Specifically, in our mechanism, we assign each input connec- ary classifier for sanitizing the training data and the reduced traffic
tion, denoted by C, an anomaly score AðCÞ to represent the degree trace. This is because we found that some portion of the real-world
of unexpectedness of the connection. The higher the score, the traffic contained scan-like activities that may affect the decision
more likely it is that we have not seen such a communication pat- made by U, as well as the operation of the RMW algorithm. We
tern before, and hence the more likely it is that a suspicious attack can apply certain basic rules to W to filter out unwanted network
is being conducted. Further, we define a threshold h and only re- activities. We will later study the impact of this background noise
cord connections whose anomaly scores are higher than h (i.e., on our proposed mechanism with and without the use of W (see
AðCÞ > h) and ignore connections with smaller anomaly scores. Section 5.3).
After data reduction, we examine the recorded connections by
using the RMW algorithm. We anticipate that the RMW can effi-
ciently and effectively handle the recorded connections, and that 4. Data reduction
the results can be provided to security analysts to identify the
attack origin and to analyze the causality and propagation of the In this section, we first clarify the concept of data reduction and
attack. The overall system architecture will be described in introduce the learning procedure for constructing a normal behav-
the next subsection. ior profile from the historical traffic trace. We then describe how U
The calculation of the anomaly score is based on the knowledge uses this profile to examine the input connections during the log-
of the past host communication patterns. We adopt Bayesian- ging phase. We also introduce and explain the usage of W to reduce
based learning techniques to establish this knowledge base from the impact of background noise in our dataset.
4.1. The concept of data reduction The NB model has the advantage of supporting the probability
inference of the class variable (i.e., connection C in our case) with
The design of the data reduction method is based on two obser- minimal resource consumption. However, the drawback is that
vations. First, the properties of ‘‘Community of Interest’’ within a features sometimes have certain relations that are not exactly
set of communicating entities have been discussed in the literature mutually independent, especially for features used for describing
and used for anomaly detection [27–29]. In a campus or an enter- a contact activity.
prise network, hosts tend to contact specific and small numbers of In our data reduction method, we adopt a Tree-augmented
destinations. The regularity of normal communication patterns im- naïve Bayes network (TAN) [31] to record host contact activities.
plies the predictability of host behaviors. Second, we argue that The TAN model is an extension of the NB model by breaking the
although there are many sophisticated techniques to hide the at- assumptions of probabilistic independence. The TAN model consid-
tack propagation from current intrusion detection techniques, an ers that features have some relations, and assigns correlated fea-
infected host must contact previously uncontacted hosts in some tures an ‘‘augmented edge,’’ while keeping the structure acyclic
recurring and systematic way; otherwise, the attack could not pro- and simple. Therefore, the TAN model not only improves predict-
gress forward. These two observations highlight the difference be- ability, but also maintains simplicity during construction with
tween normal traffic and attack traffic regarding the structure of acceptable storage requirements. Below, we describe how to corre-
host communications. late features and construct a TAN structure. Because the usage of
We define contact activities as interactions between individual the NB model is relatively simple, we will not explain it any further
hosts (who talks to whom) via specific communications channels. in this paper.
According to the observations, for each host the deviation of its We first employ mutual information [32] to measure the
previous contact activity will be regarded as an indication of a po- strength of the dependencies between features. In order to mea-
tential attack; these contacts are worth recording for the post-mor- sure the pairwise mutual information, we need to compute the
tem investigation. On the other hand, communications matching probabilities and conditional probabilities of features. Here, we
regular patterns are regarded as normal and can be ignored in use the m-estimate approach [33] to compensate for missing data
forensic investigation. Similar to anomaly-based intrusion detec- when computing probabilities. Once we have the pairwise mutual
tion, we learn and build a normal behavior profile from historical information, we can build an undirected graph in which the verti-
traffic traces to record the probability distribution of previous con- ces are the features, and the weight of each edge assigns the mu-
tact activities in the network. Based on this profile, we infer the tual information of the vertex-pair on that edge. As suggested in
rarity of a newly established connection. In summary, the data [31], we construct the TAN structure in three stages. First, we gen-
reduction method treats connections as the input data and then in- erate a maximal weighted spanning tree on the graph, with this
fers the probability of each connection to decide whether or not a tree representing an acyclic subgraph that associates the features
connection should be logged for further investigation. with higher dependencies. We next select a root and assign the
direction of all edges on the tree to be outward from the root; in
4.2. Building a normal behavior profile this step, selecting different nodes as the root will generate differ-
ent TAN structures. Lastly, we insert a vertex representing the class
The normal behavior profile consists of a set of features, their variable and add edges from the class variable to each feature in
values and the corresponding probabilities of each possible value. the graph.
The features are extracted from the network traffic trace to repre- Fig. 3 is an illustration of how to construct a TAN structure from
sent the contact activities in the trace. In what follows, we demon- the training data collected in 2011. In Fig. 3(a), the relationship
strate the procedure of describing and counting a contact activity among features is depicted, with the number on each edge repre-
in a 5-tuple. Note that our method is general enough to accommo- senting the value of mutual information of two nodes (features)
date other features in the network traffic trace. and the solid lines forming a maximum weighted spanning tree
A 5-tuple includes the source and destination addresses (de- in the graph. In Fig. 3(b), the constructed TAN structure of the
noted as SA and DA), the source and destination port numbers (de- training data is shown. Based on the TAN structure, we only need
noted as SP and DP), and the protocol (denoted as Proto) of a to record the required probabilities of the correlated features in
connection. It shows that a contact activity is initiated by the the normal behavior profile for inferring the probability of the class
source end-host and connected to the destination end-host via a variable. It is both efficient and space-saving to build a normal
specific communications channel. The 5-tuple is the most common behavior profile by using the TAN model.
feature for describing a connection within the existing literature.
The reason for selecting SP as a feature of contact activity is not
4.3. Normal traffic filter (U)
only that might some servers talk to each other through specific
source ports, but also that some applications (e.g., P2P, web, or
Based on the learned normal behavior profile, we now explain
games) usually exhibit ‘‘collaborative behavior,’’ other than the
the operation of U. According to Bayes’ theorem, the probability
typical ‘‘client–server behavior’’ [30].
of an inputted connection C can be derive by its 5-tuple as
After selecting features, we build a normal behavior profile
based on these features. A naive approach is to construct a joint PðCÞ PðSA; SP; DA; DP; ProtojCÞ
probability table for all combinations of values of different features PðCjSA; SP; DA; DP; ProtoÞ ¼
PðSA; SP; DA; DP; ProtoÞ:
(i.e., all possible contacts) and their probabilities in a network.
Although a joint probability table is simple to construct and easy In the above equation, we can ignore the effects of the probabilities
to use, its drawback is that maintaining such a table with a large PðSA; SP; DA; DP; ProtoÞ since it is regarded as a constant and
number of IP addresses and port numbers is resource intensive. gives equal impact to each connection. Moreover, based on our
For example, in our case, it requires a table with 265 entries to re- assumption, PðCÞ is also a constant because C is assumed to be nor-
cord any possible contact activity inside a class-B network by using mal and we only want to infer its degree of unexpectedness against
features {SA, SP, DA, DP, Proto}, if we only focus on TCP and UDP the history. Therefore, we use the value of the probability
connections. In contrast, a naïve Bayes (NB) model assumes that PðSA; SP; DA; DP; ProtojCÞ for comparison and for describing how
features are probabilistically independent, such that a joint proba- frequent a connection is in the past. Following the example in
bility of a feature set is the product of each feature’s probability. Fig. 3(b), this probability can be inferred as follows (and all the
Fig. 3. (a) Mutual information of each feature pair learned from the 2011 training data. The solid lines form a maximal weighted spanning tree. (b) The TAN structure is
learned in this case.
decomposed probabilities are already stored in the normal behav- noise on data reduction and forensic investigation. In the next sec-
ior profile). tion, we thus describe our evaluation strategy for not only evaluat-
ing the size of expected improvement in the forensic investigation
PðSA; SP; DA; DP; ProtojCÞ ¼ PðSPjSA; CÞ PðSAjDA; CÞ from using data reduction, but also for validating the requirement
PðDPjDA; CÞ PðDAjProto; CÞ of using W for our mechanism and dataset.
PðProtojCÞ:
5. Experiment methodology
The operation of U is as follows. First, it extracts values of the
required features from each incoming connection C. Second, In this paper, we design a series of experiments to evaluate our
according to the learned TAN structure, U queries the correspond- scalable network forensics mechanism. In this section, we first
ing probabilities (or conditional probabilities) to the normal behav- introduce the datasets used for the experiments, and then define
ior profile. Third, an anomaly score AðCÞ is measured as the metrics and describe our evaluation strategy by considering the
negative log likelihood of the probability. background noise in the dataset. Finally, we describe the threshold
AðCÞ ¼ logðPðSA; SP; DA; DP; ProtojCÞÞ: selection strategy for data reduction.
The anomaly score represents the degree of unexpectedness of a 5.1. Dataset

contact. Contacts with an anomaly score higher than h will be re-
corded for further investigation. We will describe how to select h Here we describe the characteristics of the real-world traffic
in Section 5.4. traces and explain the manual worm injection to emulate epidemic
style attacks. These two kinds of traffic traces will be combined as
4.4. Background noise filter (W) the dataset used for the experiments.
As mentioned above, the operation of the proposed data reduc- 5.1.1. Real-world traffic traces
tion method is conceptually similar to the model of anomaly-based We individually collected two sets of network traffic traces
intrusion detection. In the collected traffic trace, there is a consid- from a core router in a class-B campus network in 2006 and
erable amount of incomplete connection attempts that generate 2011. The format of the collected traces is Cisco NetFlow Version
noise to reduce prediction accuracy of the normal behavior profile. 5 [35], which summarizes a sequence of network packets with
In this subsection, we introduce W to remove the background noise the same direction (i.e., the packets share some pre-specified key
in the traffic trace. values such as source and destination IP addresses, port numbers,
In the 2006 traffic trace, we observe that about 36.2% of the dai- and protocol) as a flow. To fulfill the need of the experiments, we
ly connections in the trace were half-open TCP connections with- preprocess the real-world traffic traces in two steps. In the first
out responses and the standard deviation for each day is about step, we filter out incoming and outgoing flows from the campus
13.7%. However, this phenomenon is much more moderate in the network and only keep intra-campus flows. This is because we fo-
2011 traffic trace, where about only 3.2% are seen. This background cus on building a profile to record the communication behavior
noise may be caused by administrative scans, machine misconfig- within the network. Table 1 shows the characteristics of the col-
urations or malfunctions, or malicious scans generated by out-of- lected traffic traces during the two years. The average amount of
date attacks. Together, these sources generate a certain number intra-campus flows per day is about 6.9 million in 2006 and 21.2
of connections that are called ‘‘background radiation’’ [34]. The re- million in 2011.
quests might be blocked by the firewall on the destination side, or The second preprocessing step, in order to identify the initiator
the destination may not even exist. We find that the patterns and receiver of a connection, combines these flows by combining
caused by these half-open TCP connections are similar to port-scan flows with opposite directions into one connection, where the
or port-sweep. For example, in the week-long training data of source of the connection is the initiator. However, due to the basic
2006, more than 60% of the half-open TCP connections target des- flow expiration policies, a long-lived flow may be expired and ex-
tination port number 139, which is conventionally used by the ported as multiple small flow segments. We carefully merge this
NetBIOS session service, and these connections mostly come from kind of flow by referring to the flow merging approach mentioned
six distinct source IP addresses. in [36]. Table 1 shows that the average amount of connections per
Based on this observation, we design a rule-based W that can day decreased to about 3.2 million in 2006 and 12.9 million in
filter out half-open TCP connections in the dataset. We consider 2011. Note that some UDP applications (e.g., DNS) do follow inter-
using W to sanitize our training data, and further study the reduced active communications and those UDP flows are processed in the
traffic trace so as to understand the impact of dataset background same way as TCP flows to understand the communication patterns.
Table 1 traces are assumed to be susceptible to the attacks (e.g., 10% of

Real-world traces collected in 2006 and 2011. active hosts in our case). Selecting susceptible hosts from active
2006 2011 hosts makes the worm propagation more reasonable.
Start time 2006/10/1 2011/6/4 When the attack begins, we randomly choose a susceptible host
End time 2006/10/15 2011/6/18 as the worm origin and select a specific port number as the vulner-
Average number of intra-campus 6, 980 21, 655 able service port. The infected hosts (including the origin) will fol-
flows per day (K) low the slow propagation scenario to infect other hosts inside the
Average number of intra-campus 3, 204 12, 857
connections per day (K)
network using random scanning. The attack stops when its lifespan
Intra-campus connections TCP:UDP 2: 1 1: 11 exceeds a predefined attack duration. In this paper, we also con-
Average fraction of the half-open 36.2 3.2 sider the case that worm propagation generates half-open TCP con-
TCP connections (%) nections as well. When an attempt tries to compromise an inactive
Number of active hosts 19, 148 11, 845
host (i.e., unused IP addresses in our network), it results in a half-
open TCP connection. On the other hand, we assume an attempt
will be responded to when it points to active hosts, including in-
After preprocessing, the datasets contain information about the fected or uninfected hosts, in the network.
connection start time and end time, protocol (e.g., TCP, UDP),
source/destination IP addresses and port numbers, transmitted 5.1.3. Test trace generation
packet and byte counts on each direction, and the status of the con- The test traces used for the experiments are generated by com-
nection. In Table 1, we can see that datasets of the different years bining synthetic worm traffic with real-world traffic. Since the for-
have huge differences. The amount of connections in 2006 is about mats of these two kinds of traffic are the same, the combination
4 times smaller than that of 2011. In 2006, TCP connections were method is trivial. A test trace is created by duplicating the real-
more numerous, while more than 30% of them were half-opened. world traffic of a specific time period and then injecting synthetic
In contrast, in 2011, UDP connections generated most of the traffic worm traffic according to the start time of each worm connection.
and the phenomenon of the half-open TCP connections became In our experiments, we generate six types of worms for each year
much more moderate. One of the reasons for the increase of UDP by varying the scan period and the destination port number. Table 2
connections in 2011 is that nowadays web browsers (e.g., Google shows the characteristics of these generated test traces.
Chrome) support DNS pre-fetching to accelerate web browsing We use X and Y to denote the test traces built upon the real-
[37]. DNS pre-fetching can resolve domain names proactively be- world traffic traces of 2006 and 2011, respectively. For clarity,
fore a user clicks on links on a webpage or when entering URLs we denote the scan period and the destination port number of
in a browser. In 2011, 14% of UDP connections belonged to DNS the embedded worm traffic as the subscript and superscript of X
connections. Another reason may be due to the popularity of P2P and Y. In this paper, we control the scan periods of worms at
applications. The impact of the changing traffic characteristics on 100 s and 1000 s to demonstrate the effect of long-term attacks.
the traces can be seen in the data reduction results. For each scan period, we configure the attack duration long enough
for these worms to infect more than 99% of the susceptible hosts in
the network. Additionally, we allow each trace to have a 12-h addi-
5.1.2. Synthetic worm traffic tional observation period before and after the attack. In Table 2, we
In this study, we inject synthetic worm traffic into real-world can see that the fraction of the synthetic worm traffic in test traces
traffic trace to emulate the propagation of worms in the monitored becomes smaller when the worm scan period increases.
network. We design a typical slow propagation scenario that works We use the destination ports to emulate worm attacks targeting
as follows: an infected host first rests for a period of time (called different vulnerabilities, so as to see whether or not the proposed
the incubation period), and then starts attacking other hosts at a data reduction method can discern these attacks. The port selec-
specific rate (i.e., one over the predefined scan period). By increas- tion is purposely designed to introduce bias to our data reduction
ing these two factors, a worm attack becomes stealthier and more method. First, we produce port-139 worm attacks to demonstrate
difficult to detect. Moreover, the worm will sustain a long-term the impact of background noise, because most of the half-open
propagation lifespan. Without loss of generality, a certain number TCP connections in the 2006 traces target port 139 (Windows
of active hosts that have generated connections in the collected NetBIOS session service). Further, the Bulletin Board System
Table 2
Test traces with different injected worm traffic.
2006 Test trace X 139

100 X 23
100 X RND
100 X 139
1000 X 23
1000 X RND
1000
Attack duration (day (s)) 1 1 1 7 7 7

Number of normal conn. (K) 5, 100 5, 100 5100 29, 104 29, 104 29, 104
Half-open TCP in normal trace (%) 36.3 36.3 36.3 47.9 47.9 47.9
Number of worm conn. (K) 753 661 895 456 526 481
Half-open TCP in attack trace (%) 77.0 77.0 77.0 76.8 76.9 76.9
Worm conn. in the test trace (%) 12.9 11.5 14.9 1.6 1.8 1.7
2011 Test trace Y 139
100 Y 23
100 Y RND
100 Y 139
1000 Y 23
1000 Y RND
1000
Attack duration (day(s)) 1 1 1 7 7 7

Number of normal conn. (K) 25, 961 25, 961 25, 961 108, 236 108, 236 108, 236
Half-open TCP in normal trace (%) 3.1 3.1 3.1 3.2 3.2 3.2
Number of worm conn. (K) 771 879 787 499 467 476
Half-open TCP in attack trace (%) 83.5 83.5 83.5 83.6 83.4 83.6
Worm conn. in the test trace (%) 2.9 3.3 2.9 0.5 0.4 0.4
Note: The subscript of a test trace is the scan period (sec.) of the injected worm attack and the superscript is the destination port number. For all 2006 test traces, the trace
start time is 10/8 0 am and the attack start time is 10/8 12 pm. For all 2011 test traces, the trace start time is 6/11 0 am and the attack start time is 6/11 12 pm.
higher level of the causal tree. We will evaluate the performance of

RMW by measuring the degree of database access for an RMW
investigation.
5.3. Methodology
In this subsection, we illustrate and explain the evaluation

Fig. 4. Metrics used for data reduction, including false negative rate (FNR), false
positive rate (FPR), precision, and data reduction rate (DRR). methodology of the proposed mechanism by answering the pro-
posed questions below. Here, we not only validate the usability
of the normal behavior profile for data reduction and evaluate
(BBS) is one of the most popular network services in our monitored the improvement of the RMW investigation by using data reduc-
network and it usually operates on a default Telnet port. We pro- tion, but also discuss the impact of background noise in the real-
duce port-23 worm attacks to simulate the propagation of a worm world traffic traces on our mechanism.
embedded in the traffic of a popular service. Lastly, we randomly Q1: Does the background noise in the training data affect the sen-
choose a seldom-used port number (denoted as RND) to create sitivity of data reduction?
worms as a base line for comparison. Sensitivity represents the capability of the data reduction meth-
od to correctly identify the synthetic worm connections. The more
attack traffic we retain, the more complete the evidence we can
5.2. Metrics preserve for forensic investigation. In the experiments, we treat
the FNR as an index of the sensitivity, such that a smaller FNR im-
The metrics used for data reduction are defined in Fig. 4. After plies a better sensitivity. However, if the attack behavior violates
data reduction, false negatives (FNs) refer to worm connections our assumption and it shares similar contact characteristics with
that are incorrectly labeled as normal and exempted from forensic the usual traffic pattern, our proposed data reduction method
investigation. Similarly, true positives (TPs) refer to correctly re- may result in poor sensitivity.
tained worm connections, false positives (FPs) refer to falsely re- The normal behavior profile learned from the training data sig-
tained normal connections, and true negatives (TNs) represent nificantly affects the sensitivity of data reduction. In our experi-
correctly filtered normal connections. ments, we find that half-open TCP connections contribute a
For the goal of network forensics, minimizing FNR is critical, be- significant portion of the daily traffic while having a negative im-
cause doing so means we do not lose any attack relevant evidence. pact on differentiating worm traffic. As shown in Fig. 5(a), we con-
Furthermore, a low FPR generally implies less noise in forensics struct normal behavior profiles from the training data with and
and more storage saving, and thus more computation scaling. without using W. The learned normal behavior profiles are denoted
However, due to the base-rate fallacy [38], FNR and FPR may be by P and P, respectively. We further apply U to the test data T
insufficient for the performance measure especially for the analysis based on the reference of P and P , as shown in Fig. 5(b). The cor-
of a long period dataset. Therefore, we further use precision to rep- responding reduced traffic traces are denoted as UðTÞ and U ðTÞ.
resent the percentage of the retained worm connections against all After data reduction, we measure the amount of TPs (and FNs) in
the retained connections. Conceptually, the higher the precision, these reduced traffic traces and analyze the impact of the back-
the lower the chances that the RMW algorithm will randomly se- ground noise in the training data to our data reduction method.
lect a non-worm connection in the reduced traffic trace. Finally, Q2: Does the background noise in the test data affect the specificity
DRR gives us a quick idea of the degree of storage saved. The value of data reduction?
of DRR is affected by the capability of the data reduction method, Specificity represents the capability of the data reduction meth-
as well as the proportion of the worm traffic in a test trace. For this od to correctly identify the normal contacts and avoid recording
paper, we use FPR, FNR, and precision together to evaluate the data this kind of connection in the reduced traffic trace. To measure
reduction performance and to predict the performance of the specificity, we focus on the FPR of data reduction and analyze the
forensic investigation. composition of the FPs in the reduced traffic trace. For a given test
In this study, we adopt the same metrics used in [5] to evaluate trace, reducing the FPR not only increases the DRR, but also im-
the performance of network forensics. We measure the number of proves the precision. This indicates that a more efficient and accu-
causal edges within the top 100 frequency edges after the RMW rate forensic investigation is achievable.
investigation. The more causal edges returned, the higher the Practically, connections preserved by U should be examined
chance that we can reconstruct the causal tree based on these cau- carefully because their contact behavior deviates from the histori-
sal edges. We also check whether these edges belong to the initial cal pattern. Due to the use of different normal behavior profiles, the
Fig. 5. Illustrations of our evaluation strategies. (a) Constructing normal behavior profiles w/ or w/o using W, (b) data reduction (using U) by using different normal behavior
profiles, (c) applying forensic investigation on the reduced traffic traces w/ or w/o using W.
preserved connections may have significant differences in their practically differentiating the contact activities from normal traffic
contacts. We investigate the FPs in different reduced traffic traces to attack traffic. In this study, we apply self-validation on the col-
in terms of how much background noise is preserved. We also lected dataset to decide the value of h for our data reduction
measure the changes of the values of DRR and precision if we re- method.
move this noise from reduced traffic traces and discuss the perfor- The self-validation operates as follows. First, we identify a nor-
mance of data reduction. We argue that if preserving background mal behavior profile from the training data. Then, we use the
noise in the reduced traffic trace is unnecessary for the need of learned profile to examine the same dataset used for training to
forensic investigation, it would be a better choice to use W after compute the anomaly score of each connection. Further, we sort
data reduction as well. This augmentation will be verified in the these connections based on their anomaly scores and finally decide
next question. the value of h as the score that satisfies a predefined cut-off condi-
Q3: What is the impact of different data reduction scenarios on the tion. The cut-off condition, which can be decided by the system
forensic investigation? operator, indicates how many FPs the operator wants to preserve.
After data reduction, we apply the RMW investigation on differ- Here we define the cut-off condition as 1% of the traffic amount,
ent reduced traffic traces to analyze the performance of network with the selected h being expected to capture the infrequent con-
forensics. Based on the use of U and W, we derive four kinds of re- tact activities against this cut-off condition.
duced traffic traces for the investigation as shown in Fig. 5(c). We
adaptively tune the parameters of the RMW and analyze the prop- 6. Evaluation results
erties of the outputted connections. These results will be discussed
in Section 6.5. We discuss the results of our experiments in this section,
The above three questions are mainly used to help us under- including the performance of data reduction and the accuracy
stand the requirement of using W for data reduction when dealing and efficiency of the RMW investigation.
with background noise in the real-world traffic traces. However,
we can also understand the effect of U and validate our assump-
6.1. Sensitivity tests for data reduction
tion that the communication pattern generated by an epidemic
style attack is very different from that of normal communications.
We now discuss the data reduction results of the 2006 and 2011
In this paper, we also compare the NB-based learning model to the
test traces where the injected worms have a scan period of 100 s.
TAN-based learning model when performing data reduction. In the
For sensitivity tests, we first focus on the FNRs of the reduced traf-
experiments, we further let L denote the learning model
fic traces UL ðTÞ and UL ðTÞ for different T examined by different L.
(L 2 f NB; TANg) and use a subscript to emphasize the used learn-
Note that, as shown in Fig. 5, UL ðTÞ represents the output of data
ing model for some symbols, if required. For example, we denote a
reduction when the used profile is learned from a complete train-
normal behavior profile, which was learned based on the NB model
ing data, while UL ðTÞ represents the output of data reduction by
and by applying W to the training data, by P NB or we denote a re-
referring to the profile learned from a reduced training data by
duced traffic trace by UTAN ðTÞ to represent a test trace T that had
applying W.
been filtered by a TAN-based normal traffic filter.
In Table 3, the FNR of UNB ðX 139
100 Þ is greatly reduced in compari-
son with the FNR of UNB ðX 139
100 Þ, while the FNRs are equally strong
5.4. Threshold selection for UTAN ðX 139 139
100 Þ and UTAN ðX 100 Þ. This is because the contacts targeting
port 139 in the 2006 training data mostly belong to half-open TCP
The goal of threshold selection is to find a threshold h that can connections, and without using W, the statistics maintained by P NB
be used for comparing the anomaly score of each connection and can easily misclassify the port-139 worm attack traffic as normal
Table 3
Data reduction results of the test traces that the injected worms have scan period 100 s.
UNB ðTÞ UNB ðTÞ UNB ðTÞ UTAN ðTÞ UTAN ðTÞ UTAN ðTÞ
T= X 139 FPR 0.017 0.021 0.011 0.030 0.320 0.016

100
FNR 0.852 0.119 0.846 < 0.001 < 0.001 0.770
Precision 0.561 0.862 0.671 0.830 0.316 0.678
DRR 0.966 0.868 0.970 0.845 0.593 0.956
T= X 23 FNR 0.831 0.646 0.959 < 0.001 0.001 0.770
100
Precision 0.562 0.687 0.321 0.811 0.288 0.649
DRR 0.965 0.941 0.985 0.858 0.602 0.959
T= X RND
100
FNR < 0.001 0.001 0.770 0 0.013 0.773
Precision 0.911 0.894 0.783 0.853 0.351 0.712
DRR 0.836 0.833 0.956 0.825 0.581 0.952
T= Y 139 FPR 0.009 0.031 0.008 0.016 0.043 0.013

100
FNR 0.046 0.009 0.837 < 0.001 0 0.835
Precision 0.751 0.491 0.364 0.653 0.409 0.273
DRR 0.963 0.942 0.987 0.956 0.929 0.983
T= Y 23 FNR 0.287 0.224 0.896 < 0.001 < 0.001 0.835
100
Precision 0.720 0.463 0.292 0.682 0.441 0.300
DRR 0.968 0.945 0.988 0.952 0.926 0.982
T= Y RND
100
FNR 0.001 < 0.001 0.836 < 0.001 0 0.835
Precision 0.764 0.499 0.370 0.657 0.414 0.276
DRR 0.961 0.941 0.987 0.955 0.929 0.982
Note: The FPR are the same for different test traces when T = X 139 23 RND 139 23 RND
100 , T= X 100 , T= X 100 . For T = Y 100 , T= Y 100 , T= Y 100 , the FPR are the same as well. UL ðTÞ represents that W is
applied in the training phase and UL ðTÞ represents that W is further applied to remove the background noise in UL ðTÞ.
traffic. However, the TAN model would not be confused in separat- In addition to contact changes, the background noise in the test
ing background noise and the traffic of a port-139 worm attack, be- trace also affects the FPR of data reduction. In Table 3, we find that
cause the background noise is generated only by a small number of the FPR of UTAN ðTÞ is peculiarly higher than other cases. This is due
hosts in the network. In comparison, since the situation of back- to the abnormally large amount of background noise in 2006 and
ground noise in the 2011 training data has become more moderate, the more strict data reduction of the TAN model compared to the
we can see that in Table 3 the FNRs of UNB ðY 139 139
100 Þ and UNB ðY 100 Þ both
NB model. This observation is further demonstrated by applying
decrease greatly. Using W in the training data further reduces the W to UTAN ðTÞ. As shown in Table 3, the FPR of UTAN ðTÞ decreases
FNR of UNB ðY 139 to 0.016, which shows that most of the retained connections in
100 Þ to lower than 1%. For the TAN model, the FNRs
UTAN ðTÞ belong to the background noise. Furthermore, other cases
of UTAN ðY 139 139
100 Þ and UTAN ðY 100 Þ perform as well as the cases in 2006.
applying W to the reduced traffic traces can further decrease the
As for the destination port 23, which is used for Telnet services,
FPR to about 0.01, which is near our setting for the threshold
Table 3 shows that the NB model cannot distinguish between
selection.
worm attacks and normal traffic on destination port 23 for both
However, the drawback of applying W to the reduced traffic
the 2006 and 2011 test traces, even if W is applied to the training
traces is that in the mean time we will lose many failed worm con-
data. This is because Telnet is one of the most popular network ser-
nections that are half-opened. According to Table 2, we can verify
vice in use at the monitored network and only a small portion of
that the greatly increased FNRs of UL ðTÞ in Table 3 are mainly
the connections targeting port 23 in our training data yield the
caused by those half-open worm connections. We believe the
half-opened situation. However, the TAN model can still overcome
RMW process can still back track the worm origin, even when
the effects of the normal contacts on popular network services and
these half-open worm connections are removed from the reduced
again outperform the NB model.
traffic trace.
Lastly, in Table 3, we can also see that the two different learning
models perform equally well for identifying worm connections tar- 6.3. Long-term attacks
geting port RND. In summary, the NB model tends to generate a
high FNR in comparison with the TAN model when dealing with In this subsection we discuss the performance of data reduction
a worm attack that shares similar feature properties with connec- against worm attacks with a 1000-s scan period. Table 4 shows the
tions in the training data. FNRs, FPRs, precisions, and DRRs for different test traces and differ-
ent data reduction scenarios. Here, we first focus on the FNRs and
6.2. Specificity tests for data reduction FPRs.
Compared to Table 3, the FNRs in Table 4 indicate very similar
This subsection discusses the FPRs of data reduction for differ- characteristics in each case. This demonstrates that the proposed
ent scenarios in Table 3. Since the calculation of FPRs are only re- contact-based data reduction methods (either using the TAN or
lated to normal traffic (see Fig. 4) and in our experiments the NB models) are not affected by the propagation rate of a worm at-
durations of the test traces are fixed (2 and 8 days long for worm tack. Besides, the FPRs in Table 4 also show similar results as that
attacks with 100- and 1000-s scan periods), applying the same data in Table 3, except for the FPRs of UTAN ðTÞ and UTAN ðTÞ when exam-
reduction procedure to the equal-length test traces, such as ining the 2006 test trace. As the investigation period becomes
X 139 23 RND
100 ; X 100 , and X 100 , will derive the same FPRs. In Table 3, we find longer, we notice that there is an increased deviation of contact
that in general the TAN model generates a larger FPR than the activities of the background noise in the 2006 test traces against
NB model, especially for cases of UTAN ðTÞ. This is because the con- that in the 2006 training data, hence resulting in a higher FPR.
tact activities in our traffic traces are not that stable, and the TAN Moreover, compared to the two learning models, the NB model
model can easily identify these unapparent changes. can tolerate contact deviation to a certain degree, while the TAN
Table 4
Data reduction results of the test traces that the injected worms have scan period 1000 s.
UNB ðTÞ UNB ðTÞ UNB ðTÞ UTAN ðTÞ UTAN ðTÞ UTAN ðTÞ
T= X 139 FPR 0.019 0.046 0.016 0.391 0.464 0.022

1000
FNR 0.836 0.117 0.844 < 0.001 < 0.001 0.769
Precision 0.116 0.233 0.134 0.039 0.033 0.140
DRR 0.978 0.941 0.982 0.599 0.528 0.974
T= X 23 FNR 0.821 0.630 0.957 < 0.001 < 0.001 0.770
1000
Precision 0.142 0.128 0.047 0.044 0.037 0.157
DRR 0.978 0.949 0.984 0.598 0.527 0.974
T= X RND
1000
FNR < 0.001 0.001 0.770 < 0.001 0.003 0.773
Precision 0.459 0.266 0.194 0.041 0.034 0.144
DRR 0.965 0.939 0.981 0.599 0.528 0.974
T= Y 139 FPR 0.011 0.033 0.010 0.021 0.048 0.015

1000
FNR 0.047 0.010 0.839 < 0.001 < 0.001 0.837
Precision 0.293 0.120 0.069 0.183 0.087 0.049
DRR 0.985 0.962 0.989 0.975 0.947 0.985
T= Y 23 FNR 0.279 0.217 0.895 < 0.001 < 0.001 0.835
1000
Precision 0.227 0.092 0.043 0.173 0.082 0.047
DRR 0.986 0.963 0.990 0.975 0.948 0.985
T= Y RND
1000
FNR < 0.001 < 0.001 0.837 < 0.001 < 0.001 0.837
Precision 0.293 0.116 0.067 0.176 0.083 0.047
DRR 0.985 0.962 0.989 0.975 0.948 0.985
Note: The FPR are the same for different test traces when T = X 139 23 RND 139 23 RND
1000 , T= X 1000 , T= X 1000 . For T = Y 1000 , T= Y 1000 , T= Y 100 , the FPR are the same as well. UL ðTÞ represents that W is
applied in the training phase and UL ðTÞ represents that W is further applied to remove the background noise in UL ðTÞ.
model only allows contact activities well defined in its normal 2006 test traces, and the diagrams at the lower half are for the
behavior profile to pass through the filter. 2011 test traces.
In Fig. 6, we can see that the accuracy of the RMW increases
6.4. Precision and DRR when the value of Dt increases. In our experiments, data reduction
helps filter more than 82% to 98% of traffic before the investiga-
In Tables 3 and 4, we also record the precisions and DRRs for tion, such that we can afford configuring a wide sample window
each data reduction scenario. Recall that a higher precision im- for the RMW to correlate slow-paced worm traffic without worry-
proves the chances of the RMW for selecting worm connections. ing about the noise. In general, the RMW tends to have a higher
In our cases, however, the following observations can be made accuracy for the reduced traffic traces generated by the TAN model
from the results. First, the real amount of the FPs in a reduced traf- than by the NB model. Moreover, using W greatly improves the
fic trace is much greater than the TPs, such that the overall preci- accuracy of the RMW for the 2006 test traces, while only minimally
sion of data reduction seems poor. Second, although the TAN enhancing the 2011 test traces. This means that in a clean network
model usually generates a lower FNR than the NB model, the pre- environment, using U alone is enough for network forensics to nar-
cision value of TAN-based data reduction is not always better than row down the scope of the investigation target.
that of NB-based data reduction due to the effect of the FPs. Finally, Fig. 6(a) shows that the RMW performs well for both
139
a long investigation period (cases in Table 4 versus Table 3) and a UTAN ðX 139
100 Þ and UNB ðX 100 Þ (the solid lines). Removing background
large amount of daily traffic (cases for the 2011 test traces) will noise in the training data and the reduced traffic trace improves
also reduce the precision in our experiments. the performance of data reduction and the forensic investigation.
As for DRRs, we can see that for the UL ðTÞ cases, the NB model Fig. 6(b) shows that if a worm attack embeds in the traffic of a pop-
slightly outperforms the TAN model. This is because the TAN mod- ular network service, we must use the TAN model to carefully dis-
el tends to have a larger FPR, and the value of DRR is mainly dom- cern the worm traffic from the normal traffic. In Fig. 6(b), the

inated by the amount of normal traffic being filtered. Further, we accuracy of the RMW on UNB ðX 23 100 Þ is even worse than on
observe that the background noise indeed affects the performance UNB ðX 23
100 Þ, due to the higher FNR. Fig. 6(c) shows that for worm at-
of data reduction. Comparing the results of the 2006 and 2011 test tacks targeting port RND, applying the RMW to the reduced traffic
traces, we find that the DRRs are above 95% for most of the 2011 traces generated by different data reduction scenarios will have
cases, since the proposed data reduction method can accurately comparable accuracy.
separate the normal traffic from the worm connections. Fig. 6(d) depicts the database access counts of the RMW process
when Dt is configured as 8000 s, which leads to a reasonably high
6.5. Forensic investigation accuracy. We see that for the UTAN ðTÞ cases, which usually have
higher FPRs, the RMW tends to have heavier database access but
In this subsection, we first discuss the forensic results for the does not improve the accuracy. It is also observed that using W
UL ðTÞ and UL ðTÞ cases in Table 3. For each reduced traffic trace, can decrease the chance of database access. Relatively speaking,
we apply the RMW five times and compute the average results, a higher database access count implies better RMW results, be-
including the number of causal edges in the top 100 frequency cause the moonwalk paths tend to be long.
edges and the degree of database access. For each RMW investiga- For the 2011 test traces, the results in Fig. 6(e) and Fig. 6(g) are
tion, we randomly select 10% of the connections in a reduced traffic similar given the moderate background noise. However, worm at-
trace as the starting edges of the moonwalk paths. We fix d at 30 tacks that target popular destination ports (e.g., port-23 worm at-
and vary Dt from between 10 times the scan period of the injected tacks) are still difficult to trace back using the NB-based data
worm to 90 times (i.e., from 1000 s to 9000 s). In Fig. 6, the dia- reduction method. We suggest employing the TAN model to per-
grams at the upper half depict the results of the RMW for the form data reduction for the traffic trace, before using forensic
Fig. 6. (a), (b), (c) and (e), (f), (g) depict the accuracy of the RMW for worm attacks with a scan period of 100 s. The x-axis is the size of Dt and the y-axis is the number of
causal edges in top 100 frequency edges outputted by the RMW. Dash lines represent cases for UL ðTÞ and solid lines represent cases for UL ðTÞ . Squares represent L ¼ TAN and
dots represent L ¼ NB. (a) is the result for T ¼ X 139 23 RND 139 23
100 , (b) is the result for T ¼ X 100 , (c) is the result for T ¼ X 100 ; and (e) is the result for T ¼ Y 100 , (f) is the result for T ¼ Y 100 , (g) is
the result for T ¼ Y RND
100 . (d) and (h) depict the database access counts of the RMW when Dt is 8000 s. The x-axis represents the 3 test traces individually and the y-axis is the
degree of database access. We use different colors to represent different data reduction scenarios.
Fig. 7. RMW accuracy for worm attacks with a scan period of 100 s. The x-axis is the size of sample window Dt and the y-axis is the number of causal edges in top 100
frequency edges outputted by the RMW. Dash lines represent that W is not used and solid lines represent that W is applied. Rhombus represents there is no data reduction
before the RMW process and triangle represents the RMW is applied on a randomly sampled traffic trace with a sampling rate of 3%, (a) is the results for T ¼ X RND
100 , (b) is the
results for T ¼ Y RND
100 . These results are compared with the TAN-based data reduction and W applied (filled square and solid line, respectively).
Fig. 8. RMW accuracy for worm attacks with different scan periods for the UL ðTÞ cases. The x-axis is the ratio of the sample window size to the scan period of a worm attack
and the y-axis is the number of causal edges in top 100 frequency edges outputted by the RMW. Squares and dots represent the same results depicted in Fig. 6. Pluses and
crosses represent the results for worm attacks with a 1000-s scan period when L ¼ TAN and L ¼ NB, respectively, (a) is the result for T ¼ X 139 139
100 and T ¼ X 1000 , (b) is the result for
T ¼ X 23 23 RND RND 139 139 23 23
100 and T ¼ X 1000 , (c) is the result for T ¼ X 100 and T ¼ X 1000 ; and (d) is the result for T ¼ Y 100 and T ¼ Y 1000 , (e) is the result for T ¼ Y 100 and T ¼ Y 1000 , (f) is the result for
T ¼ Y RND RND
100 and T ¼ Y 1000 .
investigation to obtain better results. In Fig. 6(h), we notice that accuracy of the RMW for X RND 100 ; however the same configuration
although removing the background noise does not improve the makes the results even worse for Y RND100 . We find that although the
accuracy for the 2011 cases, it does reduce the database access 2011 traffic traces contain much fewer half-open TCP connections,
counts and the experiment times for the RMW investigation. the contact activities in the traces become more complicated due
In Fig. 7, we compare the proposed scalable network forensic to the growing number of P2P applications that drags down
mechanism to forensic investigation on the raw dataset and sam- the accuracy. Overall, the proposed scalable network forensic
pled traffic trace. We again use X RND RND
100 and Y 100 as the test traces mechanism results in higher accuracy and better performance in
and apply the RMW to the unrefined raw dataset and the randomly identifying the attack origin than both traditional methods.
sampled dataset with a sampling rate of 3%. Fig. 7 describes these Finally, we compare the forensic results for worm attacks with
RND
results, as well as the results of UTAN ðX RND
100 Þ and UTAN ðY 100 Þ from different scan periods. We apply the RMW on the reduced traffic
Fig. 6 for the sake of comparison. We notice that apparently ran- traces that contain much slower worms (see Table 4) and plot
dom sampling does not help the forensic investigation and the the forensic results for the UL ðTÞ cases with the corresponding
accuracy of its application of the RMW remains close to zero no cases depicted in Fig. 6 together for comparison in Fig. 8. We find
matter how long Dt is. The reason for this is the random sampling that RMW accuracy decreases by about 50% for long-term attacks.
method has little probability to sample all the connections in back This may be caused by the slightly increased FPRs for the UL ðTÞ
tracking the attack origin. On the other hand, although the raw cases in Table 4. However, in spite of the very low precision values
dataset preserves all the information for forensic investigation, it shown in Table 4, the RMW can still identify sufficient amount of
suffers from the challenges mentioned in the Introduction. In causal edges at the higher levels of the tree structure of the in-
Fig. 7, it can be seen that using W alone greatly improves the jected worm attacks.
6.6. Summary of experiment evaluations Data reduction can ease another problem that may be encoun-
tered by the RMW. When performing traceback of each moonwalk
We now summarize the results of data reduction and long-term path, if there is a normal connection to the attack origin within the
network forensics. length of the sample window, the RMW would simply pick up that
connection again and falsely identify that connection as one of the
The TAN model is usually a better fit for the training data, most critical causal edges. By using data reduction, we can elimi-
while accommodating various types of test data by select- nate the chance of facing this problem as much as can be reason-
ing a more conservative threshold. ably expected.
Although the NB model is more flexible due to the assump-
tion of feature independence, it may suffer from poor
8. Conclusions
performance when features correlate mutually.
In our cases, a well specified background noise filter helps
In this paper, we considered the challenges that network foren-
to eliminate the impact of unwanted traffic in the raw
sics will face under stealth attacks. To address this problem, we fo-
dataset.
cused on host contact behavior and proposed a data reduction
Data reduction not only greatly improves the accuracy and
method to facilitate scalable network forensics. The novelty of
the performance of the RMW algorithm, but also improves
the proposed mechanism lies in two key features: First, we
the scalability of network forensics.
adopted the RMW to support an attack-agnostic forensic investiga-
tion. Second, the proposed contact-based data reduction method
7. Discussion and future work
can deal with various intrusion evasion techniques, such as
encryption, mutation, and special target acquisition schemes. The
In the current study, we have focused on finding the origin of a
real-world trace driven evaluation results demonstrated that the
stealth self-propagating attack in the research area of network
proposed forensic mechanism yields good scalable performance
forensics. Unfortunately, this origin is usually the patient zero
in terms of storage and computation time and maintains a high
who started the epidemic, but not the real attacker. Finding the
accuracy rate of causal edge detection with the use of the RMW.
real attacker requires following the traditional way by carefully
Further, we also showed that different network traces impact on
looking at the system log of the originator or the log of the network
the sensitivity of the proposed data reduction method. This sug-
in which the originator resides to see what machine previously
gests that when dealing with real traces, the network analyst must
contacted the originator. However, network forensics can provide
first understand the characteristics of the traces.
us with the opportunity to do this. Moreover, the other advantage
The proposed mechanism still has a few limitations, including
of cooperating with the RMW is that the reconstructed attack
how to deal with background noise in general, how to update the
propagation structure can reveal attack causality and network vul-
normal behavior profile by using adaptive learning techniques,
nerabilities. It can thus help the security analysts to design a better
and how to automatically decide the threshold value for data
perimeter protection and intrusion detection mechanism. Com-
reduction. In the future, we will evaluate our approach by using
puter forensics can also help us distinguish whether and when a
more traffic traces collected from different environments, and re-
computer had been compromised. It would be one of the key ele-
fine the data reduction method accordingly.
ment for back tracking the worm propagation as well. However,
the collection of the digital evidence from computers requires
References
the cooperation of individual user or administrator. Further, com-
puter forensics may also encounter data reduction issue for long- [1] S. Staniford, V. Paxson, N. Weaver, How to 0wn the internet in your spare time,
term investigation. in: Proc. USENIX Security Symposium, Aug. 2002.
[2] M. Roesch, Snort lightweight intrusion detection for net-works, in: Proc.
Conference on Systems Administration (LISA), pp. 229–238. 1999.
7.1. Limitations [3] V. Paxson, Bro: a system for detecting network intruders in real-time,
Computer Networks (1999).
The premise of our data reduction method is the change of con- [4] C.C. Zou, W. Gong, D. Towsley, L. Gao, The monitoring and early detection of
internet worms, IEEE/ACM Transaction on Networking (2002).
tact behavior of the infected hosts. We built a normal behavior pro-
[5] Y. Xie, V. Sekar, D. Maltz, M.K. Reiter, H. Zhang, Worm origin identification
file from historical contact activities and used this profile to in the using random moonwalks, in: Proc. IEEE Symposium on Security and Privacy,
future distinguish unexpected events from frequently seen events. May 2005.
Therefore, our method can effectively reduce the amount of the [6] F. Akujobi, I. Lambadaris, E. Kranakis, An integrated approach to detection of
fast and slow scanning worms, in: Proc. International Symposium on
traffic that will be further examined during post-mortem. How- Information, Computer, and Communications Security (ASIACCS), 2009.
ever, if the premise is invalid where a sophisticated attack exhibits [7] V. Sekar, Yinglian Xie, Michael K. Reiter, Hui Zhang, A multi-resolution
exactly the same contact characteristics as usual traffic, the pro- approach for worm detection and containment, in: Proc. International
Conference on Dependable Systems and Networks (DSN), 2006.
posed method may fail. Further, we have only demonstrated the [8] D. Dash, B. Kveton, J.M. Agosta, E. Schooler, J. Chandrashekar, A. Barchrah, A.
use of a 5-tuple as the features to build a normal behavior profile. Newman, When gossip is good: distributed probabilistic inference for
How to more precisely decide the more suitable features for inves- detection of slow network intrusions, in: Proc. National Conference on
Artificial Intelligence (AAAI), 2006.
tigating contact activity is an area for future work. [9] F. Akujobi, I. Lambadaris, E. Kranakis, Detection of slow malicious worms using
multi-sensor data fusion, in: Proc. IEEE International Conference on
7.2. Adaptive forensic investigation Computational Intelligence for Security and Defense Applications (CISDA),
2009.
[10] S. Stafford, J. Li, Behavior-based worm detectors compared, in: Proc.
Although we adopted the RMW as the forensic tool in this pa- International Conference on Recent Advances in Intrusion Detection (RAID),
per, the RMW still has some problems that warrant closer study 2010.
[11] A. Kumar, V. Paxson, N. Weaver, Exploiting underlying structure for detailed
and refinement. Considering a worm attack with an unknown scan
reconstruction of an internet-scale event, in: Proc. USENIX/ACM Internet
period, we need to fine tune the RMW’s parameters, such as W; d, Measurement Conference (IMC), Oct. 2005.
and Dt, to achieve acceptable results. This problem is exacerbated [12] M.A. Rajab, F. Monrose, A. Terzis, Worm evolution tracking via timing analysis,
in the case of long-term network forensics. A method to automat- in: Proc. Workshop on Rapid Malcode (WORM), Nov. 2005.
[13] I. Hamadeh, G. Kesidis, Toward a framework for forensic analysis of scanning
ically decide values for these parameters is one of the goals of our worms, in: Proc. International Conference on Emerging Trends in Information
future research. and Communication Security (ETRICS), 2006.
[14] Y. Xiang, Q. Li, D. Guo, Online accumulation: reconstruction of worm [26] P. Giura, N. Memon, NetStore: An efficient storage infrastructure for network
propagation path, in: Proc. IFIP International Conference on Network and forensics and monitoring, in: Proc. International Conference on Recent
Parallel Computing (NPC), 2008. Advances in Intrusion Detection (RAID), 2010.
[15] Q. Wang, Z. Chen, C. Chen, Characterizing internet worm infection structure, [27] P. McDaniel, S. Sen, O. Spatscheck, J. Merwe, W. Aiello, C.R. Kalmanek,
in: Proc. USENIX Workshop on Large-Scale Exploits and Emergent Threats Enterprise security: a community of interest based approach, in: Proc.
(LEET), Mar. 2011. Network and Distributed System Security Symposium (NDSS), Feb. 2006.
[16] W. Wang, T.E. Daniels, A graph based approach toward network forensics [28] P. Verkaik, O. Spatscheck, J.V. der Merwe, A.C. Snoeren, PRIMED: community-
analysis, ACM Transactions on Information and System Security (TISSEC) 12 of-interest-based DDoS mitigation, in: Proc. SIGCOMM Workshop on Large-
(1) (2008) 33 (Article 4). Scale Attack Defense, Sep. 2006.
[17] K. Shanmugasundaram, N. Memon, A. Savant, H. Bronnimann, ForNet: a [29] J. McHugh, C. Gates, Locality: A new paradigm for thinking about normal
distributed forensics network, in: Proc. International Workshop on behavior and outsider threat, in: Proc. Workshop on New Security Paradigms
Mathematical Methods, Models and Architectures for Computer Networks (NSPW), Aug. 2003.
Security (MMM), pp. 1–16, 2003. [30] T. Karagiannis, K. Papagiannaki, M. Faloutsos, BLINC: multilevel traffic
[18] N. Liao, S. Tian, T. Wang, Network forensics based on fuzzy logic and expert classification in the dark, in: Proc. ACM SIGCOMM, Aug. 2005.
system, Computer Communications 32 (17) (2009) 1881–1892. [31] N. Friedman, D. Geiger, M. Goldszmidt, Bayesian network classifiers, Machine
[19] E. Anaya, M. Nakano-Miyatake, H.P. Meana, Network forensics with Learning 29 (2–3) (1997) 131–163.
neurofuzzy techniques, in: Proc. IEEE International Midwest Symposium on [32] T.M. Cover, J.A. Thomas, Elements of Information Theory, John Wiley & Sons,
Circuits and Systems (MWSCAS), 2009. New York, 1991.
[20] E.S. Pilli, R.C. Joshi, R. Niyogi, Network forensic frameworks: survey and [33] T.M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997.
research challenges, Digital Investigation 7 (1–2) (2010) 14–27. [34] R. Pang, V. Yegneswaran, P. Barford, V. Paxson, L. Peterson, Characteristics of
[21] B. Bloom, Space/time tradeoffs in hash coding with allowable errors, Internet Background Radiation, in: Proc. ACM Internet Measurement
Communications of the ACM 13 (7) (1970) 422–426. Conference (IMC), Oct. 2004.
[22] J. Mai, C.N. Chuah, A. Sridharan, T. Ye, H. Zang, Is sampled data sufficient for [35] Cisco System Inc., NetFlow Services and Application White paper.
anomaly detection? in: Proc. ACM Internet Measurement Conference (IMC), [36] R. Sommer, A. Feldmann, NetFlow: Information loss or win? in: Proc. ACM
Oct. 2006. SIGCOMM Workshop on Internet Measurement (IMW), 2002.
[23] S. Staniford, J.A. Hoagland, J.M. McAlerney, Practical automated detection of [37] Chrome Team, The Chromium Projects. <http://www.chromium.org/
stealthy portscans, Journal of Computer Security 10 (2002) 105–136. developers/design-documents/dns-prefetching>.
[24] M. Bailey, E. Cooke, F. Jahanian, N. Provos, K. Rosaen, D. Watson, Data reduction [38] S. Axelsson, The base-rate fallacy and the difficulty of intrusion detection, ACM
for the scalable automated analysis of distributed darknet traffic, in: Proc. ACM Transactions on Information System Security (TISSEC) 3 (3) (2000) 186–205.
Internet Measurement Conference (IMC), Oct., 2005.
[25] G. Maier, R. Sommer, H. Dreger, A. Feldmann, V. Paxson, F. Schneider, Enriching
network security analysis with time travel, in: Proc. ACM SIGCOMM, Aug.
2008.

Computer Communications: Li Ming Chen, Meng Chang Chen, Wanjiun Liao, Yeali S. Sun

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Computer Communications: Li Ming Chen, Meng Chang Chen, Wanjiun Liao, Yeali S. Sun

Загружено:

Авторское право:

Доступные форматы

Computer Communications 36 (2013) 1471–1484

Contents lists available at SciVerse ScienceDirect

A scalable network forensics mechanism for stealthy self-propagating

1. Introduction rate of false positives. In addition, the lifespan of a stealth attack

approach, in contrast, focuses on retaining and investigating con-

Considering the features of the RMW algorithm, we propose

The anomaly score represents the degree of unexpectedness of a 5.1. Dataset

Table 1 traces are assumed to be susceptible to the attacks (e.g., 10% of

2006 Test trace X 139

Attack duration (day (s)) 1 1 1 7 7 7

Attack duration (day(s)) 1 1 1 7 7 7

higher level of the causal tree. We will evaluate the performance of

In this subsection, we illustrate and explain the evaluation

T= X 139 FPR 0.017 0.021 0.011 0.030 0.320 0.016

T= Y 139 FPR 0.009 0.031 0.008 0.016 0.043 0.013

T= X 139 FPR 0.019 0.046 0.016 0.391 0.464 0.022

T= Y 139 FPR 0.011 0.033 0.010 0.021 0.048 0.015

Вам также может понравиться