Академический Документы
Профессиональный Документы
Культура Документы
Abstract-Currently we have seen a very sharp increase in The type of attack that we will be dealing with is DDoS
network traffic. Due to this increase, the size of attack log files attack. Distributed DoS attacks are a major security concern
has also increased greatly and using conventional techniques to these days. DDoS attacks are launched to compromise the
mine the logs and get some meaningful analyses about the
availability of a system or a network like DoS but unlike
DDoS attackers location and possible victims has become
DoS. the attack is launched by the adversary by creating
increasingly difficult. We propose a technique using Hadoops
MapReduce to deduce results efficiently and quickly which zombies that send several requests to the victim and
would otherwise take a long time if conventional means were overwhelm it with large amount of requests. This creates a
used. The aim of this paper is to describe how we designed a bottleneck and the victim can no further entertain requests
framework to detect those packets in a dataset which belong to a from legitimate users, denying service to them.
DDoS attack using MapReduce provided by Hadoop. During DDOS attacks, the log files swell up to huge
Experimental results using a real dataset show that sizes. These log files if analysed properly and effectively
parallelising DDoS detection can greatly improve efficiency. can help detect and recover from a DDoS attack. These log
files can take a long time if processed through conventional
Keywords - DDoS, mapReduce
means thus delaying the results due to which the recovery
I. INTRODUCTION phase could be delayed. As the attacker is progressing and
devising new ways to commit crime and intrude, our
Denial of Service (DoS) attack is launched to make an
solution to the problem emphasis is on the fact that the
internet resource unavailable often by overwhelming the
forensic community also needs to get smarter and instead of
victim with a large number of requests. DoS attacks can be
using traditional ways for detecting advanced attacks,
categorized on the basis of single source and multi source. should focus on using advanced techniques that are efficient
Multi source attacks are called distributed Dos or DDoS
and helps detect intrusions in real time.
attacks.
MapReduce is a model provided by Hadoop that is used
Today computers and networks are attacked very for parallel processing of distributed data[1]. MapReduce
frequently and these attacks can be very expensive and hard often consists of several distributed computing machines
to recover from. This leads to need of computer forensics,
working together in the form a cluster. There is a cluster
which can help us identify the following: are we under
head or master machine and several mapper and reducer
attack? Who is attacking us? And which incoming traffic is machines. Tasks are assigned and managed by the master.
malicious or a part of the attack? Evidence of such
MapReduce is a 2-step process. The map phase is the first
intrusions is required in case the affected wants to pursue
processing step in which the input hadoop file system
the court and legal action is to be taken against the
(HDFS) is broken into several splits and each split is
adversary.
assigned to a mapper. These splits are the n parallelly
These forensics investigations are very hard and often
processed by the mappers. In the Reduce Phase the
done manually in many cases [1]. Information about
intermediate results provided by the map phase are
network, web and different protocols and applications are
summarized and associated records are processed by single
saved in log files. These log files usually save everything
reducer.
and anything indiscriminately. An intelligent attacker can
Since in DDoS attac ks large data sets have to be analyzed
mask his attac ks by mixing it up with legitimate requests.
quickly to take decision about suspicious activities,
MapReduce can prove to be a very good solution. We tested
our forensics system with the data set provided by Lincoln
labs[1] and showed that DDoS forensics on Hadoop
MapReduce can produce results much efficiently than a
serial analyser. We propose distributed forensics for
distributed attacks.
Section 2 represents work of others related to ours.
Section 3 describes the design and architecture of our work.
Section 4 gives the details about the imple mentation of our
work. Section 5 is related to the evaluation of our strategy.
Section 6 concludes the report and defines areas of future
work on the project.
118
118
As is the case with most o of the programs in Hadoop, we
also used string matching fo or malic ious packets. Mappers
identified the packets which s satisfied the conditions de fined
in the five phases and wrote t them in the file. Reduce phase
was used to accumulate the m map phase outputs.
We analyzed the data asets provided by Lincoln
laboratories, exported them t to a text file and programmed
Hadoop to identify packets in n the dataset that belong to each
attack phase. Each mapper takes a packet as input and
e mits a key, value pair for (attack phase, packet
information). Some part of ou ur code for the mapper is:
1) for(int i=0;i<3;i++){
2) word.set(tokenizer.next tToken());}
3) for(int j=0;j<IPs.length h;j++)
4) if(pkt.indexOf("ICMP") ")>0
5) && &( pkt.indexOf("(ping)")>0)
Figure 3: Data Flow and Architectu ure diagram
6) {
The dataset provided is in binary fo ormat and it was 7) pkt=pkt.substring(0, co ount);
converted into text format as shown in n figure 2. After 8) word.set("\n\n******* *************************
converting into text for mat the file was g iven to HDFS for Phase 1: IP Sweep from m a remote site
the Hadoop cluster as shown in figure 3. ******************* ***********************\n
We populate the connections table by an nalysing the dump \n No.Time Source Des st Protocol Info\n"+pkt);
packets one by one. If a particular IP ad ddress crosses the 9) keyval.set(1);
'horizontal threshold' or 'vertical threshol ld' for acceptable 10) context.write(word, key yval);
ping requests, we label it as suspicious a and pass it on as 11) }
input to the next phase to further observe it ts behaviour.
Horizontal threshold is a limit on th he number of IP This is the code for ident tifying packets that belong to
addresses allowed to ping a single host t on our network phase 1. We start our work by suspecting a particular IP
within a fixed time window. Vertical thres shold is a limit on address as the attacker and then prove him as culprit by
the number of ping requests an external ho ost sends on all of showing his participation in th he attack. In line 4, the mapper
our internal hosts within a time window searches for all those packets that are participating in ICMP
If an external host crosses the 'horizon ntal port threshold' echo requests. We see that a particular IP address produces
or the 'vertical port threshold', it can qu ualify as input to a huge number of such req quests. This suspicion is then
phase 3. Horizontal port threshold is a lim mit on the number strengthened by proving his participation in each phase of
of ports an external host tries to connect to on any of our the attack. As we can see, the e mapper then creates a pair for
interna l hosts within a time window. Vertic cal port threshold: phase and a count of one e for each occurrence of a
a limit on the number of inter nal hosts to t the ports of which suspicious packet.
a single external host may connect to withi in a time window. Each reducer gets the sorted d records according to the key
In the phase 3 if the IP addresses that came down from i.e, attack phase and sums the e counts for each phase, if it is
phase 1 and phase 2 are found to be inv volved in a lot of greater than the threshold va lue, the reducer emits ( attack
incoming traffic to a single internal host, and such packets phase, no. of packets) sum. A As an optimization, the reducer
are malformed, it might be a sign of buffe er overflow attack. .
is used as a combiner on the m map outputs
Such IP addresses are marked.
In phase 4 if the outbound connections s have spoofed IP Some part of our code for th he reducer is shown below tha t
addresses (i.e. none of the source IP addre sses belong to our checks if the number of ICM MP packets is greater than a
network) and the destination is the sa ame for all such particular threshold:
packets, it might be the case that our ne etwork hosts have 1) int sum=0;
been used as zombies to launch dos a attack against the 2) while (values.hasNext( ()) {
victim. We may deduce that the IP addres sses short listed in 3) sum+=values.next().ge et(); }
Phase 4 IPs are the main culprits behind th his attack. 4) if(sum>=threshold)
5) while(values.hasNext() () ) {
IV. IMPLEMENTATION N 6) context.write(key, sum) m)
A. Implementation strategy: 7) }
We used Java as programming langua age to implement
the traffic analyzing technique. Karmasph here is a graphical B. Challenges In Implem mentation:
environment for creating Hadoop jobs. Ra ather than dealing
The datasets provided by L Lincoln laboratories were in the
with low-level Hadoop management, Karm masphere services
form of Pcap files. We neede ed the input to be in text form
provides cluster monitoring, job monitorin ng, and file system
so we converted the Pcap file es to .txt. MapReduce splits the
management through a high-level user e environment. This
file among mappers but the p problem we faced initially was
not only saves time learning Hadoop, b but also increases
that the input reader splits th he file by size. We didn't want
productivity when using Hadoop.
119
119
any packet to be spread across several mappers. To
overcome this we created our own custom reader which
could treat each packet as a single whole object, taking file
splitting into consideration. Our code for the custom reader
is:
120
120