Вы находитесь на странице: 1из 10

NETWORK INTRUSION DETECTION USING

ANOMALY DETECTION

Contents
1. Introduction
2. Identification of Problem
3. Application of Engineering Principles
4. Design Problem and Meeting Design Requirements
5. Design Process
6. Complexity of Design
7. Correlation with Self Study subjects
8. Selection of appropriate tools, skills and techniques in solving the problem
9. Future work for phase 2
10.References

Introduction
Network and system security is of paramount importance in the present data
communication environment. Hackers and intruders can create many successful
attempts to cause the crash of the networks and web services by unauthorized
intrusion. New threats and associated solutions to prevent these threats are
emerging together with the secured system evolution. Intrusion Detection Systems
(IDS) are one of these solutions. The main function of Intrusion Detection System
is to protect the resources from threats. It analyzes and predicts the behaviours of
users, and then these behaviours will be considered an attack or a normal
behaviour. We use Rough Set Theory (RST) and Support Vector Machine (SVM)
to detect network intrusions. First, packets are captured from the network, RST is
used to pre-process the data and reduce the dimensions. The features selected by
RST will be sent to SVM model to learn and test respectively. The method is
effective to decrease the space density of data. The experiments compare the
results with Principal Component Analysis (PCA) and show RST and SVM
schema could reduce the false positive rate and increase the accuracy.
Intrusion Detection System (IDS) are software or hardware systems that automate
the process of monitoring and analyzing the events that occur in a computer
network, to detect malicious activity. Since the severity of attacks occurring in the
network has increased drastically, Intrusion detection system have become a
necessary addition to security infrastructure of most organizations. Intrusion
detection allows organization to protect their systems from the threats that come
with increasing network connectivity and reliance on information systems. Given
the level and nature of modern network security threats the question for security
professionals should not be whether to use intrusion detection but instead which
intrusion detection features and capabilities can be used.
Intrusions are caused by: Attackers accessing the systems, Authorized users of the
systems who attempt to gain additional privileges for which they are not
authorized, Authorized users who misuse the privileges given to them.
Intrusion detection systems (IDS) take either network or host based approach for
recognizing and deflecting attacks. In either case, these products look for attack
signatures (specific patterns) that usually indicate malicious or suspicious intent.
When an IDS looks for these patterns in network traffic then it is network based
(figure 1). When an IDS looks for attack signatures in log files, then it is host
based.
2

Various algorithms have been developed to identify different types of network


intrusions; however there is no heuristic to confirm the accuracy of their results.
The exact effectiveness of a network intrusion detection systems ability to identify
malicious sources cannot be reported unless a concise measurement of
performance is available. The three main approaches we are considering is
Paxsons Bro, Leckie et als robabilistic approach and Jung et als sequential
hypothesis testing for scan detection.

Identification of problem
Network Intrusion-Set of actions that threaten the security requirements of
the network.
Security requirements-Integrity, Confidentiality, Availability.
Availability of tools and tricks for attacking and intruding networks.
A network intrusion is any unauthorized activity on a computer network.
Detecting an intrusion depends on the defenders having a clear
understanding of how attacks work. In most cases, such unwanted activity
absorbs network resources intended for other uses, and nearly always
threatens the security of the network and/or its data. Properly designing and
deploying a network intrusion detection system will help block the intruders.
Types of attacks:
Asymmetric Routing In this method, the attacker attempts to utilize more
than one route to the targeted network device. The idea is to have the overall
attack evade detection by having a significant portion of the offending
3

packets bypass certain network segments and their network intrusion


sensors. Networks that are not set up for asymmetric routing are impervious
to this attack methodology.
Buffer Overflow Attacks This approach attempts to overwrite specific
sections of computer memory within a network, replacing normal data in
those memory locations with a set of commands that will later be executed
as part of the attack. In most cases, the goal is to initiate a denial of service
(DoS) situation, or to set up a channel through which the attacker can gain
remote access to the network. Accomplishing such attacks is more difficult
when network designers keep buffer sizes relatively small, and/or install
boundary-checking logic that identifies executable code or lengthy URL
strings before it can be written to the buffer.
Common Gateway Interface Scripts The Common Gateway Interface
(CGI) is routinely used in networks to support interaction between servers
and clients on the Web. But it also provides easy openingssuch as
"backtracking"through which attackers can access supposedly secure
network system files. When systems fail to include input verification or
check for backtrack characters, a covert CGI script can easily add the
directory label ".." or the pipe "|" character to any file path name and thereby
access files that should not be available via the Web.
Protocol-Specific Attacks When performing network activities, devices
obey specific rules and procedures. These protocolssuch as ARP, IP, TCP,
UDP, ICMP, and various application protocolsmay inadvertently leave
openings for network intrusions via protocol impersonation ("spoofing") or
malformed protocol messages. For example, Address Resolution Protocol
(ARP) does not perform authentication on messages, allowing attackers to
execute "man-in-the-middle" attacks. Protocol-specific attacks can easily
compromise or even crash targeted devices on a network.
Traffic Flooding An ingenious method of network intrusion simply targets
network intrusion detection systems by creating traffic loads too heavy for
the system to adequately screen. In the resulting congested and chaotic
network environment, attackers can sometimes execute an undetected attack
and even trigger an undetected "fail-open" condition.
Trojans These programs present themselves as benign and do not replicate
like a virus or a worm. Instead, they instigate DoS attacks, erase stored data,
4

or open channels to permit system control by outside attackers. Trojans can


be introduced into a network from unsuspected online archives and file
repositories, most particularly including peer-to-peer file exchanges.
Worms A common form of standalone computer virus, worms are any
computer code intended to replicate itself without altering authorized
program files. Worms often spread through email attachments or the Internet
Relay Chat (IRC) protocol. Undetected worms eventually consume so many
network resources, such as processor cycles or bandwidth, that authorized
activity is simply squeezed out. Some worms actively seek out confidential
informationsuch as files containing the word "finance" or "SSN"and
communicate such data to attackers lying in wait outside the network. Once
these attack vectors are thoroughly understood, network security teams can
look for opportunities to deploy technologies and strategies that will mitigate
the potential effectiveness of each one.
Application of Engineering Principles
o

Security

Efficiency

Scalability
Design Problem and Meeting Design Requirements
The amount of data about anomalies(intrusion) is very less compared to
normal data. Thus applying classification becomes very difficult. Here
detecting the attack is important and not the type of attack.Thus
understanding what is normal is the key for detecting something which is
abnormal.

In the preprocess part, we use the packet sniffer, which is built with Jpcap
library, to store network packet information including IP header, TCP header,
5

UDP header, and ICMP header from each promiscuous packet. After that,
the packet information is divided by considering connections between any
two IP addresses (source IP and destination IP) and collects all records every
2 seconds. An intrusion detection system (IDS) is a system that dynamically
monitors the system and user actions in the network in order to detect
intrusions. Because an information system can suffer from various kinds of
security vulnerabilities, it is both technically difficult and economically
costly to build and maintain a system which is not susceptible to attack.
Experience has taught us to never rely on a single defensive line or
technique. IDSs have been widely regarded as being a part of the solution to
protect todays network systems. Research on IDSs began with a report by
Anderson followed by Dennings seminal paper, which lays the foundation
for most of the current intrusion detection prototypes. Since then, many
research efforts have been devoted to wired network IDSs. Numerous
detection techniques and architecture for host machines and wired networks
have been proposed. With the proliferation of wireless network and mobile
computing applications, new vulnerabilities that do not exist in the wired
network have appeared. Security poses a serious challenge in deploying
wireless network in reality. However, the vast difference between wired and
wireless network make traditional intrusion detection techniques
inapplicable. Wireless IDSs, emerging as a new research topic, aim at
developing new architecture and mechanisms to protect the wireless
network.
Design Process

Data is collected from a network which is free from any intrusion.

Data is then divided into two: training data and testing data.

Training data is used to train the model.

Testing data is used to validate the trained model.


Certain attributes in an information system may be redundant and can be
eliminated without losing essential classificatory information. One can
consider feature (attribute) reduction as the process of finding a smaller
(than the original one) set of attributes with the same or close classificatory
power as the original set. Rough sets provide a method to determine for a
given information system the most important attributes from a classificatory
6

power point of view. The concept of the reduct is fundamental for rough sets
theory. A reduct is the essential part of an information system (related to a
subset of attributes) which can discern all objects discernible by the original
set of attributes of an information system. Another important notion relates
to a core as a common part of all reducts. The core and reduct are important
concepts of rough sets theory that can be used for feature selection and data
reduction.
Complexity of Design
In the training phase , the feature selection is the most important phase.
Quantity and quality features should be selected.
Example: Number of failed logins, Duration, Type of protocol,
Number of wrong fragments received.
If many features are used it results in high dimensionality and is called curse
of dimensionality.
If appropriate features aren't selected then it leads to erroneous results.
Correlation with self study subjects
Software Engineering
o Software process model- Waterfall model

o Functional requirements-

Collect suspicious traffic and information that describes or


characterizes the traffic
Detect intrusions specific to a designated area of protection
Automatically record events and incidents
Monitor and scan networks
Detect based on content
Detect intrusions for multiple operating systems
Detect intrusions for multiple platforms (hosts, switches, routers, etc.)
o Non-Functional requirements
Anomalous events or breaches in security should be detected in realtime and reported immediately.
The IDS must not place undue burden or interfere with the normal
operations for which the systems were bought and deployed to begin
with.
The IDS must be scalable. As new computing devices are added to the
network, the IDS must be able to handle the additional computational
and communication load.
SSCD
o Python- an interpreted language.
o Interpreted language-programs are 'indirectly' executed(interpreted) by an
interpreter program. Different from other languages where in the code is
converted to machine code and then executed.
Selection of appropriate tools, skills and techniques in solving the
problem
Multivariate Gaussian fitting algorithm is used .
Is a probabilistic model for representing the presence of subpopulations
within an overall population, without requiring that an observed data set
should identify the sub-population to which an individual observation
8

belongs. Formally a mixture model corresponds to the mixture distribution


that represents the probability distribution of observations in the overall
population. However, while problems associated with "mixture
distributions" relate to deriving the properties of the overall population from
those of the sub-populations, "mixture models" are used to make statistical
inferences about the properties of the sub-populations given only
observations on the pooled population, without sub-population identity
information.

Some ways of implementing mixture models involve steps that attribute


postulated sub-population-identities to individual observations (or weights
towards such sub-populations), in which case these can be regarded as types
of unsupervised learning or clustering procedures. However, not all
inference procedures involve such steps.
Numpy is used for scientific computing.
NumPy is an extension to the Python programming language, adding
support for large, multi-dimensional arrays and matrices, along with a large
library of high-level mathematical functions to operate on these arrays. The
ancestor of NumPy, Numeric, was originally created by Jim Hugunin with
contributions from several other developers. In 2005, Travis Oliphant
created NumPy by incorporating features of the competing Numarray into
Numeric, with extensive modifications. NumPy is open source and has many
contributors.
KDD dataset is used.
This is the data set used for The Third International Knowledge Discovery
and Data Mining Tools Competition, which was held in conjunction with
KDD-99 The Fifth International Conference on Knowledge Discovery and
Data Mining. The competition task was to build a network intrusion detector,
a predictive model capable of distinguishing between ``bad'' connections,
called intrusions or attacks, and ``good'' normal connections. This database
contains a standard set of data to be audited, which includes a wide variety
of intrusions simulated in a military network environment.
Future work for phase 2
9

Analysing the dataset.

Feature selection.

Implementing the multivariate guassian fitting algorithm to train the


model

Testing the model.

References
I.
II.
III.

K. Scarfone and P. Mell, Guide to Intrusion Detection and Prevention


Systems (IDPS), NIST Special Publication 800-94, 2014
Conry-Murray, Anomaly Detection On the Rise, June 2013
Ahmed Youssef and Ahmed Emam, Network Intrusion detection using data
mining and network behaviour analysis, 2014

10