Dynamic File Analysis Using Ensemble of RNN and SVM

Dynamic File Analysis Using Ensemble Of RNN And
SVM
Tanu Tomar, Nikhil Anil Kumar, Aarshi Dwivedi, Darshan Sheth
September 23, 2019

Towards partial fulfilment for Undergraduate Degree Level
Programme Bachelor of Technology in Computer Engineering
A First Project Evaluation Report on:
Prepared by:
Admission No. Student Name
U16CO031 Tanu Tomar

U16CO029 Darshan Sheth
U16CO033 Aarshi Dwivedi
U16CO016 Nikhil Anil Kumar
Class : B.TECH. IV (Computer Engineering) 7th Semester
Year : 2019-2020
Guided by : Dr. Dipti P. Rana
DEPARTMENT OF COMPUTER ENGINEERING SARDAR

VALLABHBHAI NATIONAL INSTITUTE OF TECHNOLOGY,
SURAT – 395 007 (GUJARAT, INDIA)
Student Declaration
This is to certify that the work described in this project report has been actually carried out and
implemented by our project team consisting of
Sr. Admission No. Student Name

1 U16CO031 Tanu Tomar
2 U16CO033 Aarshi Dwivedi
3 U16CO029 Darshan Sheth
4 U16CO016 Nikhil Anil Kumar
Neither the source code there in, nor the content of the project report have been copied or downloaded from
any other source. We understand that our result grades would be revoked if later it is found to be so.
Signature of the Students:
Sr. Student Name Signature of the Student

1 Tanu Tomar
2 Aarshi Dwivedi
3 Darshan Sheth
4 Nikhil Anil Kumar
Certificate
This is to certify that the project report entitled Dynamic File Analysis Using
Ensemble Of RNN And SVM is prepared and presented by
Sr. Admission No. Student Name

1 U16CO031 Tanu Tomar
2 U16CO033 Aarshi Dwivedi
3 U16CO029 Darshan Sheth
4 U16CO016 Nikhil Anil Kumar
Final Year of Computer Engineering and their work is satisfactory.
SIGNATURE:
GUIDE JURY HEAD OF DEPT.

List of Figures
3.1 Proposed model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 LSTM vs GRU [26] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 SVM [26] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4
List of Abbreviations
σ Sigmoid Function
API Application Program Interface
CNN Convolutional Neural Networks
FTP File Transfer Protocol
GRU Gated Recurrent Unit
HTTP Hyper Text Markup Language
IDS Intrusion Detection System
IOC Indicator Of Compromise
IP Internet Protocol
IPS Intrusion Prevention System
KNN K Nearest Neighbour
LSTM Long Short-Term Memory
NAT Network Address Translation
PCAP Packet Capture
RF Random Forest
RNN Recurrent Neural Networks
SVM Support Vector Machine
VM Virtual Machine
5
Abstract
It is undeniable that social media has gained immense popularity over the years. While
the rise of social media has made the world more connected, it has also raised concerns
of security and privacy. Social media sites act as breeding grounds for malware. Every-
day a substantial number of internet users are affected by malicious files sent to them
via the internet. Hence, efficient intrusion detection and prevention systems have also
gained importance. A variety of intrusion detection systems have been developed using
varying machine learning algorithms providing different accuracies. In this paper, we
aim to develop an intrusion detection and prevention system which successfully differen-
tiates between malicious and benign files sent over a chat server. The machine learning
algorithms of recurrent neural networks (RNN) and Support Vector Machine (SVM) will
be used and attempts will be made to make the system more robust with the use of
honeypot and sandboxing techniques.The DFRWS 2007 challenge data-set has been used
for training and testing which includes a variety of files (eg. .pdf, .exe etc). Finally, the
performance will be evaluated using various performance evaluation metrics.
Keywords— Intrusion detection, machine learning, neural networks, honeypot, sand-
boxing
Contents
1 Introduction 2
1.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Project Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Review 5
3 Proposed Framework 8
4 Conclusion 12
A Definitions 13
1
Chapter 1
Introduction
Networking is the interlinking of two or more computers to allow them to operate in-
teractively. Availability of abundant data and information over the network has led to
a stark increase in data theft leaving the users adversely affected by the malicious data
they come across with over the network.
Malware is a software devised for the execution of malicious activities on victims’
machines without their consent. Attackers utilize malware to spy, control, and steal the
victims sensitive information. Its also employed to make victims’ machines act as bridges
to internal networks. Unauthorized access by a system/user causes network intrusion into
a network system resulting in manipulation of data/information. This calls for a need to
develop a system for intrusion detection and prevention.
IDS is a hardware or software monitor that scrutinizes data to detect an attack initi-
ated on a system or over a network. IDS can be classified into three categories by detection
approach: signature-based, anomaly-based, and hybrid-based. Signature-based detection
is designed to detect known attacks by using their respective signatures. As newer types
of attacks signature are unknown, they remain undetected by this approach. This is
where anomaly-based detection comes into picture where current user activity against
pre-defined profiles is used to detect anomalous behaviours that may be intrusions. How-
ever, this method generally has high false-positive rates. To overcome the shortcomings
of a single method of IDS, hybrid-based detection is used which is a combination of two
or more methods of intrusion detection to achieve the advantages of methods combined.
IDS is complemented by an IPS which actively monitors the incoming traffic of a sys-
tem and weeds out any malicious requests. IPS averts attacks by blocking offending
IPs, dropping malicious packets, notifying security personnel to potential threats, and
resetting the connection. Since the IPS is placed in the immediate communication path
of source and destination, it must work fast and efficiently to avoid degrading network
performance and as exploits can occur in near real-time. The IPS must also detect and
respond accurately to avoid reading legitimate packets as a threat, to remove threats and
false positives.
A honeypot is a mechanism set to detect, deflect, or, hinder attempts of unauthorized
use of information systems. It is an emerging technology where a farm of virtual machines
is created with different operating systems which sees only the malicious traffic. It reduces
false positives by processing various types of files and URLs in the corresponding virtual
machines and checking its behaviour and at last generating a score. Honeypot net is
a group of honeypots that behaves like a point of diversion for inbound traffic from an
attacker. The outbound traffic is constricted from the non-honeypot nodes to prevent
2
the attacker from tracking the route of traffic. The honeypots in the honeynet behave
similarly to the victims machine running all the services like HTTP, FTP. Source NAT
is utilized in these honeypots to alter the outbound traffic from the honeynet. Shadow
honeypots have an anomaly-based sensor as in the case of Snort IDS to re-route the
traffic to a honeynet when the anomaly is come across in network traffic. While the goal
of the honeypot is to entice attackers to avoid their attacks, sandboxing is focused on
assessing likely infections that may have already affected the system, and running them
in isolation to not affect the rest of the production environment. Sandbox is an isolated
test environment which is used for testing code before deploying it into a production
system, to simulate an environment similar to production.
1.1 Applications
As the behaviour of files sent over the chat server network goes unnoticed, its not unheard
of for security of the system running that chat server being compromised. Upon combining
various machine learning tools one can ensure detection and prevention of such malicious
files before they cause serious damage to one’s machine.
1. Monitoring and scrutinising both the user and system activities.
2. Categorize anomalous behaviour, not generally recognized by existing antivirus due

to unknown signature.
3. Analyse system configurations as well as its vulnerabilities.
1.2 Motivation
With social media gaining popularity by the day, it has become very common to send
files back and forth over the chats. As of now, minimal efforts have been made to monitor
the malicious files sent over the chat server network. This poses a huge threat for the
victims machine. The main motive of this project is to detect malicious files using IDS
and prevent it from causing harm to the innocent receiver. Yet another problem we
encounter with the existing antivirus is that they usually rely on signature-based attacks
than analysing the behaviour of the file. Current scenario calls for a need to develop
an intrusion prevention and detection system to monitor the files being transferred over
chat servers. This project adds and extra layer of security by combining machine learning
tools with the common IDS.
1.3 Objectives
The primary goal is to come up with a different approach to classify the data files using
a mix of RNN and SVM. The aim is to obtain a better accuracy than the conventional
approaches which only employ a single machine learning algorithm. The implementation
will be done on a dummy chat server to verify the accuracy of the proposed model to
mimic a real life scenario.
3
1.4 Project Organisation
Chapter 1 of the report gives a brief introduction, the application of the chosen project in
real world, motivation and objective behind the work, and the organization of the report.
Chapter 2 comprises of the literature review and the theoretical background related to
the project. Chapter 3 of the report comprises of the proposed algorithm and flowcharts
of the project. Finally, chapter 4 recapitulates the report and talks about the possible
future work for the project.
4
Chapter 2
Literature Review
Sandboxing is a way to automate the analysis of the file being sandboxed. This approach
proves to be more efficient than the general approach followed by a generic Antivirus,
which just cross checks signatures. Cuckoo reports are easier to analyse when the files are
non-executable than executable files. Bazzi et. al. [1] studied the feasibility of detection
of malicious non executable files for which 6052 benign files and 10852 malicious files
obtained from Contagio Dump were used. All benign non-executable files leave the same
traces. The Cuckoo sandbox reports are based on Win 32 API calls. The files created
and downloaded by the malware and network trace in PCAP format along with memory
dumps and is automatically analysed by SVM algorithm based on the values from the
reports like number of dropped files, number of accessed files, number of accessed registry
entries.
Traditional IDS focuses on detection from static signatures and anomaly based IDS
which have many challenges. Rene et. al. [2] implemented honeypot to collect intrusion
logs, analyse it and extract IOC from the data and apply clustering techniques on the
extracted IOC and develop IOC rule which improved the detection rate for IDS. High
accuracy detection is according to dynamic analysis and IOC rule base development is
the base. It clusters behaviour of the malware rather than the string algorithm which
signatures for traditional IDS users. Features include using of decision engine to filter
incoming traffic, control engine to restrict outgoing traffic to restrict attacks and a redi-
rection engine. Hierarchical clustering and K-means clustering is done on the IOC report
extracted from the CTU and DRAPA Dataset and IOC rule is changed from XML to
Snort format and written into the IDS rule base.
The paper by Kateryna Chumachenko [3] aims to check which algorithm gives less
false negative and determine the best feature extraction, feature representation, and
classification methods that result in the best accuracy when used on the top of Cuckoo
sandbox. Malicious families that were used are Dridex, Locky, TeslaCrypt, Vawtrak, Zeus,
DarkComet, CyberGate, Xtreme, CTB-Locker. The families where taken from VirusTotal
using static analysis and code checking by string extraction, file format inspection, hash
computation and AV scanning which is basically signature-based analysis. By means
of this paper dynamic analysis was used where the inspection of dataset was done in
the Cuckoo sandbox and the report generated was analysed by the different machine
learning algorithms such as SVM, decision tree, Random Forest. The report generated
by the Cuckoo Sandbox had a score called the Cuckoo score which is used to classify the
malware based on a threshold. SVM gave almost 0 false positives where decision tree
gave a higher accuracy than SVM.
5
Countless papers have been published in the field of IDS that incorporated KDD99
datasets but only a few used the actual KDD99 dataset due to which the results generated
may not have been accurate. Another problem encountered with the KDD99 dataset was
that majority of the records were duplicated in the training and test set, respectively.
The large amount of redundant records in the training set caused the learning algorithms
to be biased towards the more frequent records, and thus prevented it from learning
infrequent records. To solve this issue Hasan et. al. [4] came up with a new dataset that
eliminated all the redundant records in the original KDD dataset and this new derived
dataset was called KDD99Train+ and KDD99Test+. Two models were constructed using
the new dataset, one using SVM and the other Random Forest tree. The results obtained
depicted that while the detection of Dos, R2L and U2R was done better by SVM, RF
provided excellent test accuracy in terms of precision and false negative rates. Overall,
SVM performed better than the RF classifier.
Primary goal of this paper was to compare the two machine learning algorithms,
SVM and a neural network respectively. Mukkamala et. al. [5] trained SVM using the
KDD dataset, which was partitioned into two classes: normal and attack, where attack
represents a collection of 22 different attacks. Testing of SVM was done using 6980 data
points along with 41 features. Meanwhile a Neural Network was also setup with the
following architectures, Network A: layer, 4 1-20-20-20- 1; Network B: 3-layer, 4 1-40-40-
1; Network C: 3-layer, 1-25-20-1. The neural network was trained using feed forward
backpropagation algorithm that employed scaled conjugate gradient descent for learning.
Various accuracies were tested for Network A, B and C. Finally, a conclusion was made
that SVMs had a greater potential than neural networks as the former was highly scalable
and also showed higher accuracy while working with large datasets.
Goal of this paper was to reduce the time taken to analyse the big data. Othman et.
al. [6] proposed an IDS classification method called Spark-Chi-SVM. A pre-processing
method was used to convert categorical data into the numerical data for the sake of
improving the efficiency of the classification. Then a Chi-Square Selector was used to
reduce the dimensionality of the dataset to reduce the computational time. Finally
SVM with SGD is used for the data classification. Dataset used is KDDCUP99. All
this is taking place in the Apache Spark architecture which uses master/slave model.
Measures used to check the performance of the model was AUROC (Area Under Curve)
AUPRC (Area under Precision-Recall Curve) and the time measures, their model showed
a high performance and reduces the false positive rate, Chi-SVM came out to be the best
classifier when compared to common SVM and Logistic regression.
The paper by C. Yin et. al. [7] aims to develop an intrusion detection system using
deep learning techniques of Recurrent Neural Networks (RNN) instead of the traditional
machine learning algorithms. The performance of the model is then analysed in both
binary and multiclass classification. Furthermore, the impact of the number of neurons
and different learning rates on the accuracy is studied. To compare and contrast the
results obtained, the performance of the aforementioned model was compared to the
performance of the traditional machine learning algorithms like naive Bayesian, Random
forest, multi-layer perceptron, support vector machine and other methods in multiclass
classification. The dataset used for the training and testing of the model was the NSL-
KDD, one of the most common datasets used in intrusion detection systems. According
to the results obtained, the RNN model proved to have a higher accuracy in both binary
and multiclass classification as compared to the other machine learning algorithms.
Host Based Intrusion Detection System (HIDS) is a type of IDS that monitors a com-
6
puter system on which it is installed to detect an intrusion and/or misuse, and responds
by logging the activity and notifying the designated authority. (def is copied) The paper
by Chawla, et. al. [8] describes an anomaly-based IDS using the combination of RNN
and CNN. The replacement of the normal LSTM by Gated Recurrent Units reduced the
training time which made this model more efficient. The model was trained using the
ADFA Dataset which contains system call traces. The probability of a particular system
call sequence is what determines intrusion detection. Sequences with lower probabilities
are labelled as an anomaly and hence detected as an intrusion. The results of this model
when compared to the LSTM model provided far better results in terms of training time.
However, in comparison with the ensemble models the model was considerably slower.
This created the possibility of future work of combining the current model with a KNN
based and an encoder-decoder model.
7
Chapter 3
Proposed Framework
PDF Format was created in 1994 by Adobe and has been a popular attack vector and still
is. Most of the Phishing links are attached with either a link or a PDF file upon clicking
might look legit because of its contents. It actually creates a RAT trojan which at the end
creates a backdoor for command in control attacks. Sandboxing is a closed environment
which we will using to the contain the analysis and reports are generated with which
rules are generated and applied to the IDS. HoneyNet is a network of several Honeypots
running different VMs and libraries running services like FTP, HTTP needed for their
functioning. This is a virtual internal network mimicking the actual inter network so
as to trick the attacker and wait for the exploits and learn accordingly and the Next
Generation-Firewall will prevent connectivity from the IP and he will be contained in
the Demilitarized zone in a Honeypot. The outgoing traffic is limited by a Redirection
engine in the Honeywall which reduces the detection.
8
Figure 3.1: Proposed model
RNN works on the the principle that it considers the current input and also the
previously received inputs. It can memorize the inputs because of the internal memory.
It works very well on sequential data if this happens what will happen next. We will be
using the LSTM to reduce the vanishing gradient where the contribution of the previous
steps wont be insignificant.
9
Figure 3.2: Flowchart
Ensemble process is one in which if error SVM is more than error RNN, recalibrate
weights from RNN to SVM. If the error SVM is less that error RNN then recalibrate the
weights from SVM to RNN. LSTM replaces logistic and tanh hidden units with memory
cells that hold analog value which works on 3 gates, forget gate, output gate and input
gate. sigmoid functions are used instead of tanh functions. This is because tanh squeezes
values between 1 and −1 and sigmoid function between 1 and 0 causing values to be
forgotten or to be disappeared. When multiplied with 1 the value is kept. The output
gate decides what the next hidden state should be. We pass the previous hidden state
and current input into a sigmoid function, . Then we pass the modified cell state into
tanh function to decide what the hidden state should carry. Cell state is the combination
of input gate and forget gate. GRU(Gated recurrent unit) has a change over LSTM in
Figure 3.3: LSTM vs GRU [26]
such a way that forget gate and input gate is converged into update gate and the forget
gate into the reset gate. It has lesser tensor functions making it speedier than LSTM.
Precision can calculated by,
T ruepositive
P recision =
(T ruepositive + F alsepositive)
10
Precision is the exactness of the output. SVM is a supervised learning algorithm which
can be used for both regression and classification. Its a discriminative classifier used to
find the optimal hyperplane between 2 classes. The optimal hyper-plane is defined by
the plane that maximizes the perpendicular distance between the hyper-plane and the
closest samples.
Figure 3.4: SVM [26]
The distance between the positive and negative hyperplane is called margin. The
lines touching the data points on 2 sides are called support vectors.SVM can be used for
linearly separable as well as non-linearly separable data. Linearly separable data is the
hard margin whereas non-linearly separable data poses a soft margin.
• SVMs provide compliance to the semi-supervised learning models. It can be used

in areas where the data is labeled as well as unlabeled.
• Feature Mapping used to be quite a load on the computational complexity of the

overall training performance of the model. However, with the help of Kernel Trick,
SVM can carry out the feature mapping using simple dot product.
Web Whatsapp is a common web app which we all use to send files and text. The files
sent over it has no scanning technique, not even signature based analysis to check if the
PDF file sent is malicious or not. The project implementation will include features of
sending and scanning a file with AV scanner and File scanning tools. Dynamic Analysis
will be done in the sandbox.
11
Chapter 4
Conclusion
In conclusion, the project will proceed as per the network diagram discussed above. First,
every user will be redirected to the honeynet where many sandboxes will be deployed to
analyse the files sent across the network. According to the reports generated by the
sandbox, rules will be generated and updated into our next-generation firewall according
to which the user will be let into our internal network. Next, the IDS, which will function
on an ensemble of RNN and SVM will be used to classify the files as malicious or benign.
Since SVM can be used for both supervised and unsupervised learning, it works faster on
linearly classifiable data whereas RNN works well with sequential data. The vanishing
gradient problem is fixed by replacing the LSTM cells by GRU. This is the overall layout
and complete framework of the project.
12
Appendix A
Definitions
The following appendix gives a short description of the terms used within the report.
A.1 Unsupervised Learning

Unsupervised learning is the training of an artificial intelligence algorithm using infor-
mation that is neither classified nor labeled and allowing the algorithm to act on that
information without guidance.[9]
A.2 Supervised learning

Supervised learning is the machine learning task of learning a function that maps an input
to an output based on example input-output pairs.[1] It infers a function from labeled
training data consisting of a set of training examples.[10]
A.3 Deep Learning Networks

Deep learning is an artificial intelligence function that imitates the workings of the human
brain in processing data and creating patterns for use in decision making. Deep learning
is a subset of machine learning in artificial intelligence (AI) that has networks capable of
learning unsupervised from data that is unstructured or unlabeled. Also known as deep
neural learning or deep neural network.[11]
A.4 Convolution Neural Networks(CNN)

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep
neural networks, most commonly applied to analyzing visual imagery. CNNs are regu-
larized versions of multilayer perceptrons.[12]
A.5 Recurrent neural networks(RNN)

A recurrent neural network (RNN) is a class of artificial neural networks where connec-
tions between nodes form a directed graph along a temporal sequence. This allows it to
exhibit temporal dynamic behavior.[13]
13
A.6 Virtual Machine (VM)
In computing, a virtual machine (VM) is an emulation of a computer system. Virtual
machines are based on computer architectures and provide functionality of a physical
computer. [14]
A.7 Indicators Of Compromise (IOC)

Indicators of Compromise (IOC) are pieces of forensic data, such as data found in system
log entries or files, that identify potentially malicious activity on a system or network.[15]
A.8 Long short-term memory(LSTM)

Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) archi-
tecture[1] used in the field of deep learning. Unlike standard feedforward neural networks,
LSTM has feedback connections. [16]
A.9 Bi-directional Long Short Term Memory (Bi-

LSTM)
Bidirectional LSTMs are an extension of traditional LSTMs that can improve model
performance on sequence classification problems.[17]
A.10 Multi-layer Perceptron Model (MLP)

A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An
MLP consists of at least three layers of nodes: an input layer, a hidden layer and an
output layer.[18]
A.11 Nave Bayes Model

A Naive Bayes classifier is a probabilistic machine learning model thats used for classifi-
cation task. The crux of the classifier is based on the Bayes theorem.[19]
A.12 Decision Tree

A decision tree is a decision support tool that uses a tree-like graph or model of decisions
and their possible consequences, including chance event outcomes, resource costs, and
utility. It is one way to display an algorithm that only contains conditional control
statements.[20]
14
A.13 Random Forest
Random forests or random decision forests are an ensemble learning method for classifica-
tion, regression and other tasks that operates by constructing a multitude of decision trees
at training time and outputting the class that is the mode of the classes (classification)
or mean prediction (regression) of the individual trees.[21]
A.14 Support Vector Machine (SVM)

Support Vector Machine (SVM) is a supervised machine learning algorithm which can
be used for both classification or regression challenges. However, it is mostly used in
classification problems. [22]
A.15 Ensemble
Ensemble methods is a machine learning technique that combines several base models in
order to produce one optimal predictive model. [23]
A.16 Gated Recurrent Unit (GRU)

The GRU is like a long short-term memory (LSTM) with forget gate but has fewer
parameters than LSTM, as it lacks an output gate. [24]
A.17 Kernel Trick

The Kernel Trick is a technique in machine learning to avoid some intensive computation
in some algorithms, which makes some computation goes from infeasible to feasible. [25]
15
Bibliography
[1] Bazzi,A. and Onozota,Y., Automatic Detection of Malicious PDF Files Using Dy-
namic Analysis. Division of Electronics and Informatics, Faculty of Science and
Technology,Gunma University,Japan.
[2] Rene,C. and Abdullah, J. Malicious Code Intrusion Detection using Machine Learn-
ing And Indicators of Compromise. Faculty of Computer Science Information Tech-
nology, Universiti Malaysia Sarawak.
[3] Chumachenko,K. Cuckoo SandBox. Machine Learning Methods For Malware De-
tection And Classification.(2017)
[4] Hasan,M., Nasser,M., Pal,B., Ahmad,S. Support Vector Machine and Random For-
est Modeling for Intrusion Detection System. Journal of Intelligent Learning Sys-
tems and Applications, 2014, 6, 45-52,(2014)
[5] Tavallaee,M., Bagheri,E., Lu, W., and Ghorbani,A. A Detailed Analysis of the KDD
CUP 99 Data Set. Proceedings of the 2009 IEEE Symposium on Computational
Intelligence in Security and Defense Applications, Ottawa, 8-10 July 2009, pp. 1-
6.14.,
[6] Bradley, A. The use of the area under the ROC curve in the evaluation of machine
learning algorithms. Pattern Recognit. 1997;30(7):114559.
[7] Yin,C., Zhu,Y., Fei,J. and He, X. A Deep Learning Approach for Intrusion De-
tection Using Recurrent Neural Networks.State Key Laboratory of Mathematical
Engineering and Advanced Computing , Zhengzhou,China, (2017).
[8] Chawla A., Lee,B., Fallon,S. and Jacob,P. Host based Intrusion Detection System
with Combined CNN/RNN Model, (2017).
[9] https://whatis.techtarget.com/definition/unsupervised-learning
[10] https://en.wikipedia.org/wiki/Supervised_learning
[11] https://www.investopedia.com/terms/d/deep-learning.asp
[12] https://en.wikipedia.org/wiki/Convolutional_neural_network
[13] https://en.wikipedia.org/wiki/Recurrent_neural_network
[14] https://en.wikipedia.org/wiki/Virtual_machine
[15] https://searchsecurity.techtarget.com/definition/
Indicators-of-Compromise-IOC
16
[16] https://en.wikipedia.org/wiki/Long_short-term_memory
[17] https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classific
[18] https://en.wikipedia.org/wiki/Multilayer_perceptron
[19] https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c
[20] https://medium.com/greyatom/decision-trees-a-simple-way-to-visualize-a-decision-
[21] https://en.wikipedia.org/wiki/Random_forest#cite_note-ho1995-1
[22] https://www.analyticsvidhya.com/blog/2017/09/
understaing-support-vector-machine-example-code/
[23] https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-are-the
[24] https://en.wikipedia.org/wiki/Gated_recurrent_unit
[25] http://www.chioka.in/explain-to-me-what-is-the-kernel-trick/
[26] https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-st
17

Dynamic File Analysis Using Ensemble of RNN and SVM

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Dynamic File Analysis Using Ensemble of RNN and SVM

Загружено:

Авторское право:

Доступные форматы

Dynamic File Analysis Using Ensemble Of RNN And

Tanu Tomar, Nikhil Anil Kumar, Aarshi Dwivedi, Darshan Sheth

September 23, 2019

Admission No. Student Name

U16CO031 Tanu Tomar

Class : B.TECH. IV (Computer Engineering) 7th Semester

Guided by : Dr. Dipti P. Rana

DEPARTMENT OF COMPUTER ENGINEERING SARDAR

Sr. Admission No. Student Name

Signature of the Students:

Sr. Student Name Signature of the Student

Sr. Admission No. Student Name

2 U16CO033 Aarshi Dwivedi

3 U16CO029 Darshan Sheth

4 U16CO016 Nikhil Anil Kumar

Final Year of Computer Engineering and their work is satisfactory.

GUIDE JURY HEAD OF DEPT.

3.1 Proposed model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

API Application Program Interface

CNN Convolutional Neural Networks

FTP File Transfer Protocol

GRU Gated Recurrent Unit

HTTP Hyper Text Markup Language

IDS Intrusion Detection System

IOC Indicator Of Compromise

IPS Intrusion Prevention System

KNN K Nearest Neighbour

LSTM Long Short-Term Memory

NAT Network Address Translation

PCAP Packet Capture

RNN Recurrent Neural Networks

SVM Support Vector Machine

1. Monitoring and scrutinising both the user and system activities.

2. Categorize anomalous behaviour, not generally recognized by existing antivirus due

3. Analyse system configurations as well as its vulnerabilities.

Figure 3.3: LSTM vs GRU [26]

Figure 3.4: SVM [26]

• SVMs provide compliance to the semi-supervised learning models. It can be used

• Feature Mapping used to be quite a load on the computational complexity of the

A.1 Unsupervised Learning

A.2 Supervised learning

A.3 Deep Learning Networks

A.4 Convolution Neural Networks(CNN)

A.5 Recurrent neural networks(RNN)

A.7 Indicators Of Compromise (IOC)

A.8 Long short-term memory(LSTM)

A.9 Bi-directional Long Short Term Memory (Bi-

A.10 Multi-layer Perceptron Model (MLP)

A.11 Nave Bayes Model

A.12 Decision Tree

A.14 Support Vector Machine (SVM)

A.16 Gated Recurrent Unit (GRU)

A.17 Kernel Trick

Вам также может понравиться