Вы находитесь на странице: 1из 113

Correlation and Analysis of Security

Threat Information using External Feeds

Javier Gombao Fernández-Calvillo

Bachelor of Science (Honours) In Network Management


and Cloud Infrastructure

Athlone Institute of Technology

2018

1
1 TABLE OF CONTENTS
2 Acknowledgements .................................................................................................. 3
3 Abstract ....................................................................................................................... 4
4 Introduction ............................................................................................................... 5
5 Related work .............................................................................................................. 7
6 Methodology, Architecture and Design ............................................................. 9
6.1 IntelMQ ........................................................................................................................... 11
6.2 IntelMQ Manager ......................................................................................................... 22
6.3 Scripts ............................................................................................................................. 25
6.4 General overview of the study ................................................................................. 29
7 Evaluation and Results ......................................................................................... 30
7.1 Requirements ................................................................................................................ 30
7.1.1 Hardware requirements .............................................................................................. 30
7.1.2 Software requirements ................................................................................................ 30
7.2 Installation .................................................................................................................... 30
7.3 Utilization of the software ........................................................................................ 33
7.3.1 Command line ................................................................................................................ 33
7.3.2 IntelMQ Manager ......................................................................................................... 38
7.3.3 Management .................................................................................................................. 61
7.3.4 Monitor ............................................................................................................................ 61
7.4 Obtaining File Output: ............................................................................................... 62
7.4.1 Command lines .............................................................................................................. 62
7.4.2 IntelMQ Manager ......................................................................................................... 63
7.5 Statistical Analysis ...................................................................................................... 70
7.5.1 Map of the world ........................................................................................................... 70
7.5.2 Pie charts and tables .................................................................................................... 76
8 Conclusion ................................................................................................................ 88
9 Declaration ............................................................................................................... 90
10 References ............................................................................................................. 91
11 Appendix ................................................................................................................ 93
11.1 Configuration files of IntelMQ: ............................................................................... 93
11.1.1 Runtime.conf .................................................................................................................. 93
11.1.2 Pipeline.conf ................................................................................................................... 99
11.2 GenerateMap.py ......................................................................................................... 101
11.3 ISO3166-1-Alpha-2.txt ............................................................................................... 104
11.4 GenerateChart.py ...................................................................................................... 109

2
2 ACKNOWLEDGEMENTS
I am using this section to express my warm thanks to the University of Alicante,
which gave me the opportunity to study at Athlone Institute of Technology using an
Erasmus program during the academic year 2017/2018. Moreover, I am thankful to
my whole family because they supported me in my decision to study abroad. In this
sense, I would like to make a special mention to parents, Antonio and Pilar, and my
little sister, Nuria, for their advices and counseling during this project.

I would also like to express my gratitude to the teachers who supported me


throughout the course Network Management and Cloud Infrastructure at AIT. This
point includes my supervisors Sohelia, who gave me the idea to investigate the topic
that I am going to talk about (cybersecurity), and Tom, who assisted to the
presentations of this research. Last but not least, I want to express my thanks to my
classmates and their interest in this study. Actually, I am thankful for their
constructive criticism, guidance, advice and their opinion during the whole academic
year.

3
3 ABSTRACT
Cybercrime activity has been growing over the years and there is no evidence that
this tendency will stop in the future. Hence, this act raises the obligation of the
organization’s cybersecurity team to strengthen the cybersecurity in order to avoid
serious damages in a connected world. Nowadays, there are some external sources
which identify a large amount of data related with cyber threats with up-to-date
information that organization’s cybersecurity identifies. However, the data allocated
in these resources are quite heterogeneous and they are presented to the cyber
analyst in different formats (text files, HTML pages, csv files…) and structures. This
situation makes the study and the analysis of the threat feeds quite tough. For this
reason, it is necessary to utilize some mechanism to correlate these threat feeds.
Therefore, this paper describes how to integrate and correlate the obtained data
from a few external sources related with cyber threats using a tool called IntelMQ.
Then, we will perform some visualizations using scripts which are coded in Python 3
programming language with specific frameworks and libraries in order to extract
results, evaluations and conclusions.

4
4 INTRODUCTION
The connected electronic information network has become an integral part of our
lives. In fact, all kinds of organizations (financial, medical, education institutions,
governments…) use the network for collecting, processing, storing and sharing
amounts of digital information which could be bank accounts, passwords, private
documents, contracts or personal identities, among others. The protection and
control of this data is crucial to guarantee the privacy and the safety of the user over
the Internet. In this sense, Cybersecurity has an important role because it
investigates the way to protect the systems which are connected over the network
from unauthorized use or harm. Related to the previous topic, Cyber Threat
Intelligence (CTI) is based in services and a set of organised files with in-depth
information about specific threats, which provides reports and analysis to the users
through external feeds or security feeds. Then, this paper reflects the study,
analyse and the correlation of some of the existing cyber threats using a
platform which allows to obtain the data from these security feeds.

Nowadays, there are studies which confirm that the cyberattacks have been
increased in the recent years due to the attackers are finding new ways to target
networks to access, change, destroy, extorting or interrupting digital data over the
Internet. As an example, the number of ransomware attacks increased 300% in 2016
in relation to the previous year where 1,000 ransomware attacks were seen per day
[1]. As for bot activity, Symantec observe an increase of 6.7 million hosts in 2016 [2].
Moreover, the new attacks are mostly distributed and reported by different tools
which may seem normal activity individually. The main problem is that multiple
alerts should be correlated together to raise an alarm of an actual attack.
The interest in this field derives from the recognition that it is impossible to stop
technically advanced adversaries without foreknowledge of their intentions and
methods.

5
Then, the main goal of the paper is to gather alerts from diverse Threat
Intelligence resources and perform a statistical analysis in which the data
is collected, examined, summarized, manipulated and interpreted to
discover patterns, trends, relationships or underlying causes. The tool that
we are going to focus in order to obtain the data related with cyber threats will be
IntelMQ. It consists a solution for CERTs (Computer Emergency Response Team)
for collecting and processing security feeds using a message queue protocol [3]. After
this, we will create a couple of scripts to analyse and study the data that we have
obtained in a graphical way: map of the world, that indicates where the attacks
come from, tables and charts. The results will depend of the parameter that the user
introduces in these scripts.

The rest of this paper is organized as follows. Section five provides a Related
Work in which we will describe other studies that are working on the correlation of
attacks. Section six explains the Methodology, Architecture and Design where
this work will be described using diagrams with the system components and their
functionality. Section seven talks about the Evaluation and Results which are
essential to understand the got achievements and the proposal itself. We conclude
the whole study in the section eight. After that, there is a Declaration chapter
which indicates that this is a unique research. Moreover, the document has a
Bibliography, which provides a list of links used in this research. The paper ends
with the code that we have used to perform our study in the Appendix section.

6
5 RELATED WORK
There are a few studies about the correlation of cyber threats. The major part of
literature provides a discussion of spam features [4]. However, only a few studies
also include other types of malicious traffic. The author of [5] focuses an analysis of
alert reports of various detection systems deployed at a local network, called
National Research and Education Network in Czech Republic (CESNET NREN)
such as honeypots as well as flow-based traffic analysis systems. The study splits the
alerts by their source and attack type, i.e.: scanning activity, bruteforce, web accesses
on honeypots and SYN flood attacks… into individual datasets to make the analysis
from two perspectives. The first one is about the time correlations of alerts where
the authors ask whether is it usual that the same IP address is detected and
reported as malicious repeatedly and how long does it take for such address to be
reported again. In this sense, the study demonstrates that the observation of more
than one report from the same IP address is probably affected by dynamic address
assignment, with causes that a single malicious host could own other IP addresses.
The second perspective discusses the correlations between individual types of alerts,
i.e. how many addresses from one dataset group can be found in another group as
well and where datasets are grouped by their type of malicious traffic. The obtained
results evidences that characteristics of malicious traffic from blacklist and other
sources are valid when observing traffic in a local network.

A work identifying malware using cross-evidence correlation [5] introduces a new


correlation method called deLink, which supports forensic investigations for
malware-related evidence that automates the analysis of datasets from several
computer systems. The main components of this methods are: data collection,
examination and link mining. Data collection involves making a forensically-sound
copy of the original media and preserve the integrity of the malware-related
evidence. Examination consists about applying filtering techniques to limit the
amount of data to be examined. Finally, the link mining is used to generate a

7
structured presentation of interconnected and linked objects in order to reveal
correlations. Then, the output of the program is a filtered structured dataset, which
is clustered based on common linked patterns from all involved sources. Thereupon,
the result of the study, demonstrates that deLink method facilitates the detection of
correlations in evidence existing on the hard drives of multiples machines.

Most of the related work utilize its own mechanisms to perform correlations such as
local networks or specific methods. In contrast, this research gathers alerts from
external security feeds perform some data mining to extract correlation patterns
using an open source platform which is easy to handle and manipulate called
IntelMQ. Since it is open source, the software is available to the use for its use and
modification from its original design. After that, the Python scripts will provide us a
better way to analyse and study the results that we have obtained from this
program.

8
6 METHODOLOGY, ARCHITECTURE AND DESIGN
The information and storage of cyber threats is quite important in the management
of cybersecurity. As soon as the incidents and the vulnerabilities are detected, a
management process is generated which creates plenty of information that is
necessary to know and process in the shortest possible time. These data come from
several tracking methods from cybersecurity organizations (private or public). The
process and sharing these information is, therefore, a critic aspect in the
management of the cybersecurity. Nowadays, the cybersecurity community offers
diverse update information resources related with cyber threats. However, the way
to give them to the user is quite diverse and its integration could be difficult for the
platforms who are looking for automate them. In this sense, the sources that we are
going to use in our study are the following:
1. http://www.abuse.ch: It offers threads referring to the monitoring of the
threats corresponding to the harmful code of ZeuS, Palevo, SpyEye and Feodo.
1.1. Zeus is a Trojan malware that runs on Microsoft Windows versions. It
is used to steal banking information by registering browser keys and
hoarding forms.
1.2. Palevo is a worm-type malware that affects computers with a Windows
operating system. Once a computer have been infected, it becomes part
of a network of bots, which are controlled remotely by a central node. It
can be used to carry out a multitude of criminal activities, for example
in denial of service (DoS) attacks. Another feature is the ability to block
security software.
1.3. SpyEye is a banking Trojan that allows an attacker to create a botnet
very easily and collect sensitive data from its victims.
1.4. Feodo is another banking Trojan which can record sensitive user data
such as bank access credentials, cards and other additional services
such as PayPal or Amazon. When the victim access to the online
banking site and before the transmission of data by HTTPS, the Trojan

9
saves the same in plain text that are then collected and sent to the
attacker.
2. http://malwaredomains.lehigh.edu/: It is promoted by the North American
university Lehigh and it provides information related with a various
malicious code.
3. http://www.malwaredomainlist.com: It is a non-commercial initiative that
provides lists of domains related to harmful code: phishing, fraud, Trojan,
ransomware…
4. http://www.phishtank.com: Open initiative for URL reporting of phishing
sites.
5. http://malc0de.com/dashboard/: It provides a large malware database,
malicious websites and spam in a text files.
6. https://www.spamhaus.org/: The Spamhaus Project is a non-profit
organization that looks for spam, phishing, malware and botnets, providing
real-time, highly accurate, actionable threat intelligence.

In order to provide a better way to analyse the data allocated in each website, we
are going to use a platform whose functionality is based in graphs. A graph is a
model of interconnected data where the connections are just important as the data
elements themselves. They are modelled as nodes, edges and properties. In fact,
many technologies exist to work with graphs including graph databases, graph
analytics and graph visualization libraries. Thereupon, it is a visual model of data,
and it could be accessible by non-scientists and they can convey a deeper
understanding of the information. In fact, graphs are used by cyber security and
cyber intelligence, anti-fraud, government and intelligence because the data is very
complex for the following reasons:
1. Large: for big organizations, storing years of raw data means a large amount
of pieces of information.

10
2. Unstructured: the data is coming from different sources, it could be
incomplete and evolves. Therefore, it is hard to employ a structured data
model.
3. Dynamic: the IT systems generate new data constantly.

As an example, the next two images correspond to two screenshots of data which
represents malicious sites. The data is allocated in two different websites, which
come from Spamhaus and Abuse.ch. As we can see, the data is given in a different
format (text file and URL) and with different values (IP and website):

The security teams use graphs to extract insights from complex data in order to
provide distinct points of view. From the analytical point of view, it helps to analyse
large datasets to find interesting data. From the visualization point of view, it helps
users to interpret the data and, therefore, make smart decisions. In this context, the
name of the platform that we are going to use in this research is IntelMQ.

6.1 INTELMQ
It is an open source program, which means that the user can modify the source of
the program without license restrictions. Moreover, it consists a solution for the
Information Technology (IT) security teams such as CSIRTs, CERT, abuse
departments… to process and collect security feeds using a message queueing
system to process properly the different external sources. In our case, we are going
to use Redis, which is an in-memory database engine, based on storage in hashes
tables (key / value). Basically, IntelMQ processes the data mostly automatically,

11
ensures the accuracy, enrich the data (AS, geolocation) and filter it for collecting and
processing threat intelligence.

The design of this platform was influenced by AbuseHelper which is a tool that
allows the redistributing and receiving threats and abuse feeds as well, but IntelMQ
was coded from scratch to reduce the complexity and the losses when the tool is
performing the correlation. Moreover, it provides and easy manner to store the
results, to create your own blacklists, the communication with other systems such as
RestFul API. Its configuration files are written in JSON format in order to
understand the configuration files in an easy way.

As we said previously, the platform is based in graphs and therefore it has a


modular structure using nodes, which are called bots and edges in order to
provide relationships between them. The whole graph is called botnet. There are
four types of bots in which each of them has a specific functionality.
1. Collectors: it obtains data from threat feeds. In our case, it will consist in the
six thread feeds which we have described above.
2. Parsers: it splits the data into individual elements (log lines) with a
determined structure (data harmonization).
3. Experts: it provides more information to the data such as country code
(geographic location), DNS reverse record, domain name…
4. Outputs: it writes the events to a text file.

12
Each bot owns a source queue and it can have several destination queues.
Nevertheless, the outputs don’t have destination queues. Moreover, multiple bots
can write to the same queue. As a result, there will be multiple inputs for the next
bot. Every bot runs in a separate process and each of them is identified by a unique
number, called bot-id. Currently we can execute multiple processes of the same bot
in parallel with different bots-id.

The program uses JSON configuration files: defaults.conf, runtime.conf,


pipeline.conf, harmonization.conf and BOTS. The contents of each file are

allocated in a remote repository on GitHub (in the folder IntelMQ):


https://github.com/jgfc1/ThesisRepository and they are allocated in the appendix
section as well. These files are stored in the /etc/ directory when we access to the
program by the terminal. In fact, the following picture shows this structure:

/opt/intelmq

/etc /var

BOTS /lib /log /run

.log files
defaults.conf /bots .pid files
and .dump

harmonization.con
f /file-output

pipeline.conf events.txt

runtime.conf

13
The folder /var is used by the program in order to provide the file output where the
cyber threats are allocated, the log files and dump files are stored in the /log
directory and finally the number of pib for each bot is stored in the /run folder. We
will describe these files and folders below:

/etc/ directory

Defaults.conf

It contains the predetermined values for all bots and their behaviour, error handling
and registration options. The next table shows these values:
{
"accuracy": 100,
"broker": "redis",
"destination_pipeline_db": 2,
"destination_pipeline_host": "127.0.0.1",
"destination_pipeline_password": null,
"destination_pipeline_port": 6379,
"error_dump_message": true,
"error_log_exception": true,
"error_log_message": true,
"error_max_retries": 3,
"error_procedure": "pass",
"error_retry_delay": 15,
"http_proxy": null,
"http_timeout_max_tries": 3,
"http_timeout_sec": 30,
"http_user_agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36",
"http_verify_cert": true,
"https_proxy": null,
"load_balance": false,
"log_processed_messages_count": 500,

14
"log_processed_messages_seconds": 900,
"logging_handler": "file",
"logging_level": "DEBUG",
"logging_path": "/opt/intelmq/var/log/",
"logging_syslog": "/dev/log",
"proccess_manager": "intelmq",
"rate_limit": 0,
"source_pipeline_db": 2,
"source_pipeline_host": "127.0.0.1",
"source_pipeline_password": null,
"source_pipeline_port": 6379
}

Value Description
Broker It allows the option to select which broker you
want to use. In this case, the value is “Redis”.
destination_pipeline_db It corresponds to the broker database which the
source_pipeline_db
bot will use to connect and exchange messages.
This is a requirement for Redis broker. The
value that we used is 2 for both attributes.
destination_pipeline_host In this case, this refers to the broker IP, FQDN
source_pipeline_host
or Unix socket that the bot will use to connect
and send messages. The value is 127.0.0.1.
destination_pipeline_password It corresponds to the password of the broker
source_pipeline_password
that the bot will use to connect and exchange
messages. It can be null for unprotected broker.
The value is null.
destination_pipeline_port It corresponds to the broker port that the bot
source_pipeline_port
will use to connect and exchange messages. Its
value can be null for Unix socket. The value is
6379.

15
error_dump_message If the value is true, it indicates whether the bot
will write queued up messages to its dump file.
The dump file is used to see the possible errors
when we run the botnet.
error_log_exception If the value is true, it indicates that if there is
an exception when we run the botnet, the option
will allow to write errors reports on the log file.
error_log_message If the value is true, it indicates that if there is
an error when we run the botnet, the option will
allow to write errors reports on the log file.
error_max_retries If there is an error, the bot will try to re-start
processing the current message the number of
times which are defined here. In this case, the
value is 3.
error_procedure If there is an error, this option defines the
procedure that the bot will adopt. The value
here is pass.
error_retry_delay It is an integer value which defines the number
of seconds to wait between subsequent re-tries
in there is an error. The value in this case is 15.
http_proxy It is a HTTP proxy the that bot will use when
performing HTTP requests. For instance:
bots/collectors/collector_http.py. Since we have
null value in this field, this parameter does not
affect to our results.
http_timeout_max_tries This field defines the number of times that the
bot will try to connect when there is a timeout.
In this case, the value is 3.
http_timeout_sec It defines the seconds of the timeout. In this

16
case, the value is 30.
http_user_agent It defines the user-agent string that the bot will
use when performing HTTP/HTTPS requests.
The values selected are Mozilla, AppleWebKit,
Chrome and Safari.
http_verify_cert If the value is true, it indicates that if the bot
will verify SSL certificates when performing
HTTPS requests.
https_proxy It defines the HTTPS proxy that the bot will use
when performing secure HTTPS requests. Since
we have null value in this field, this parameter
does not affect to our results.
load_balance It allows to choose the behaviour of the queue. If
the value is true, it splits the message into
several queues without duplication and if the
value is false it duplicates the message into each
queue.
log_processed_messages_count It defines the count of log processed messages,
500 in this case.
log_processed_messages_seconds It defines the seconds of log processed messages,
900 in this case.
logging_handler There are two options: "file" or "syslog".
logging_level It is used to define the system-wide log level
that will be use by all bots and the intelmqctl
tool. The possible values are "CRITICAL",
"ERROR", "WARNING", "INFO" and "DEBUG".
logging_path It only can be applied when the logging_handler
property is file. Basically, it defines for the
system-wide log/ folder that will be use by all

17
bots and the intelmqctl tool. Default value is
allocated in /opt/intelmq/var/log/
logging_syslog It only can be applied when the logging_handler
property is syslog. Either a list with hostname
and UDP port of syslog service, e.g. the default
value is allocated in "/var/log".
rate_limit It is an integer which indicates the time interval
(in seconds) between messages processing. In
our study, the time interval is 0.

Runtime.conf
It contains the configuration for the individual bots by specifying specific fields.
Thereupon, each bot which are defined here, corresponds to a node in the graph.

Structure:
"<bot ID>": {
"group": "<bot type (Collector, Parser, Expert, Output)>",
"name": "<human-readable bot name>",
"module": "<bot code (python module)>",
"description": "<generic description of the bot>",
"parameters": {
"<parameter 1>": "<value 1>",
"<parameter 2>": "<value 2>",
"<parameter 3>": "<value 3>"
}
}
}
Examples:
"abusech-feodo-ip-collector": {
"parameters": {
"feed": "Abuse.ch Feodo IP",
"provider": "Abuse.ch",
"http_url":
"https://feodotracker.abuse.ch/blocklist/?download=ipblocklist",
"http_url_formatting": false,
"http_username": null,

18
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Abuse.ch Feodo IP",
"enabled": true,
"run_mode": "continuous"
},

"Abusech-IP-Parser": {
"parameters": {},
"name": "Abuse.ch IP",
"group": "Parser",
"module": "intelmq.bots.parsers.abusech.parser_ip",
"description": "Abuse.ch IP Parser is the bot responsible to parse the
report and sanitize the information.",
"enabled": true,
"run_mode": "continuous"
},

The previous table shows the values that each bot can have. Firstly, the group
attribute indicates if the bot pertains to Collector, Parser, Expert or Output.
The name and module fields indicate the bot code, which corresponds to a name in
order to identify each bot. Moreover, there can be a description to provide details of
the bot.

After that, we can add additional parameters such as ssl certificates, username or
password in http or its formatting. In some cases, the bots are configured as
continuous run mode in order to have them always running in order to take the
data constantly. In addition to this, if the value of enable is true means that the
bot is started when we start the whole botnet. To disable a bot, we should change
the value of the previous attribute to false. This file is allocated in the appendix
section and in the remote repository:
https://github.com/jgfc1/ThesisRepository/blob/master/IntelMQ/runtime.conf

19
Pipeline.conf
It defines the source and the destination queues per bot (the edges of the graph).

Structure:
"<bot ID>": {
"source-queue": "<source pipeline name>",
"destination-queues": [
"<first destination pipeline name>",
"<second destination pipeline name>",
...
] },

Example:
"abusech-feodo-ip-collector": {
"destination-queues": [
"Abusech-IP-Parser-queue"
]
}
"Abusech-IP-Parser": {
"source-queue": "Abusech-IP-Parser-queue",
}

In the previous case, there is a relationship between a bot called abusech-feodo-ip-


collector (gets data from a specific web-site) and abusech-ip-parse (it parses the

data) called abusech-ip-parser-queue. Note that the bot abusech-ip-parser


has a relationship to another bot (deduplicator-expert-queue). In the next
page, we will explain all the bots that we have defined in this study. In a graphical
way, this picture illustrates the relationship between these two bots:

We can observe the contents of the whole file in the Appendix section and through
this website:
https://github.com/jgfc1/ThesisRepository/blob/master/IntelMQ/pipeline.conf

20
Harmonization.conf

This file contains the configuration to specify the fields for all the message types. In
fact, the harmonization library will load this configuration to check if the values are
according to the harmonization format. This file is maintained by IntelMQ platform.

Structure:
{
"<message type>": {
"<field 1>": {
"description": "<field 1 description>",
"type": "<field value type>"
},
"<field 2>": {
"description": "<field 2 description>",
"type": "<field value type>"
}
},
}
Example:
"feed.accuracy": {
"description": "A float between 0 and 100 that represents how
accurate the data in the feed is",
"type": "Accuracy"
},
"feed.name": {
"description": "Name for the feed, usually found in collector bot configuration.",
"type": "String"
},

The contents of this file are allocated in this website:


https://github.com/jgfc1/ThesisRepository/blob/master/IntelMQ/harmonization.conf

BOTS
It contains the configuration hints for all the bots. This file can be accessed through
this URL:
https://github.com/jgfc1/ThesisRepository/blob/master/IntelMQ/BOTS

21
/var/ directory

events.txt
This is the file where the cyber threats will be correlated in order to perform some
visualization to extract an analysis of the data. We will observe an example of this
file in the next chapter.

log files and .dump


Log files show information about the activity related to a specific bot: the last time
the bot was initialized, the name of the bot and the debug information indicating
that each parameter of the bot is executed properly. Otherwise, the errors are
reported in dump files. We will explain with examples these files in the next chapter
a well.

.pid files
They consist in files with a number, which identifies the bot active processes.

6.2 INTELMQ MANAGER


There is a tool in IntelMQ, whose name is IntelMQ Manager, that gives the
possibility to configure the bots and the relationship between them straightforward
in a graphical mode. IntelMQ Manager generates the configuration of the files
runtime.conf and pipeline.conf. The picture of the next page illustrates the
main graph that we are going to study in this research. Note that each type of bot is
represented using different colours. The red nodes represent the collector bots, the
green nodes are parser bots, the blue nodes are expert bots and the yellow one is the
file output:

22
23
Each bot has a distinct functionality, let’s walk through each of them:
Collector bots

External source where we are going to obtain the data


Name of the bot
related with threat feeds:
Spamhaus-Drop-
https://www.spamhaus.org/drop/drop.txt
Collector
Abusech-Feodo-Ip-
https://feodotracker.abuse.ch/blocklist/?download=ipblocklist
collector
Abusech-Zeus-
https://zeustracker.abuse.ch/blocklist.php?download=baddo
Baddomains-
mains
Collector …………………….................…………………….................…………………….................…………………….................…………………….................…………………….........

........…………………….................…………………….................…………………….................

Abusech-Zeus-
https://zeustracker.abuse.ch/blocklist.php?download=domain
Domainblocklist-
blocklist
Collector
PhishTank-Collector http://data.phishtank.com/data/online-valid.csv
Malware-Domain-List-
http://www.malwaredomainlist.com/hostslist/mdlcsv.php
Collector
Malc0de-Windows-
http://malc0de.com/bl/BOOT
Format-Collector

Parser bots

Spamhaus-Drop-Parser It is the bot responsible to parse the DROP reports and


sanitize the information.
Abusech-IP-Parser
Abusech-Domain-
Parser
They are the bot responsible to parse the report and
PhishTank-Parser
sanitize the information as well.
Malware-Domain-List-
Parser
Malc0de-parser

24
Expert bots

It is the bot responsible for detection and removal of


Deduplicator duplicate messages. Messages get cached for
Expert <redis_cache_ttl> seconds. If found in the cache, it is
assumed to be a duplicate.
It is the bot responsible to apply the Ecsirt Taxonomy to all
Taxonomy-Expert events.

Url2fqdn-expert It is the bot responsible to parsing the fqdn from the url
GethostbyName-2-
It is the bot responsible to parsing the ip from the fqdn.
expert
GetHostByName-1-
expert
Cymru-Whois- It is the bot responsible to add network information to the
Expert events (BGP, ASN, AS Name, Country, etc …)

Output bots
File-output It is the bot responsible to send events to a file

The files that we have commented before are related with the graph in the sense
that each bot (node) is defined in the runtime.conf file and each relationship
between bots (edge) are defined in the pipeline.conf.

6.3 SCRIPTS
Once we have obtained the file output, the next step is to provide some
visualizations using charts or maps of the world indicating the number of attacks
that we have identified in order to realise a better analysis about the external threat
feeds that we have selected for this research. For that, we have created two scripts
in Python using Object Oriented Programming. These scripts are allocated on a

25
remote repository (GitHub) that we can access here:
https://github.com/jgfc1/ThesisRepository and they are allocated in the appendix
section as well. The scripts use the file output that we have generated with IntelMQ
as an input of the program.

1. generateChart.py: this script counts some attributes such as feed name,


classification taxonomy, classification type or type of malware identified in
order to generate a graphics to provide a graphic visualization of our study.
With these values, the program makes percentages and represents it into a
pie chart. Moreover, this script will provide a text file,
output_classification_malware.txt, which contains the count of the

attributes that we have indicated on the program.

To develop these diagrams, we have use the library matplotlib, which


produces publication quality figures in a variety of hardcopy formats and
interactive environments across platforms [7] such as plots, histograms, bar
charts… In addition, we have used the library pandas which provides easy-to-
use data structures and data analysis tools [8]. Its variables, methods and the
relationship between the classes are detailed below:

First of all, we have a class called ‘Struct’, which provides two variables: the
taxonomy and an integer (count). There is a constructor and functions which
are used to obtain the value of the previous variables (getTaxonomy and
getCount).
The “Chart” class is used to generate the graph itself. The methods that this
class uses are:

26
a. getFileEvents: it returns the name of the fileEvents.
b. obtainDistinctTaxonomy: it eliminates duplicates from the list of the
taxonomy.
c. countTaxonomy: this function will count the taxonomy from the list.
d. printTaxonomy: this function will print the name of the countries and
its occurrence.
e. getOcurrences: this function will count the times that a specific
attribute appears in the file events.txt.
f. loadData: it stores the data generated.
g. createChart: it generates the graph itself with the bubbles around it.

2. generateMap.py: this script will create a map of the world with a specific

number of circles using a determined size. The size represents how many
countries there are in the file-output of IntelMQ. Then, the program takes the
code of the countries in the iso3166-1-alpha-2 (i.e. IE is the country code of
Ireland or ES is the country code of Spain) format that IntelMQ has identified
of each cyber threat resource in the file-output and then it compares each code
with another file, iso3166-1-alpha-2.txt, which indicates the code of the
country, the latitude, longitude and the name of country. After that, it takes
the latitude and longitude of each country identified and represents it into the
map of the world with circles. The size of the circle indicates the size of cyber
threats in each country (bigger size, more attacks). There can be the
possibility that some countries don’t have attacks.

For that, we have used folium, which allows to add bubbles to a map in which
each bubble has a size related to a specific value. In addition to this, the
program provides another file output, whose name is output.txt, which
indicates the number of attacks per country in descending mode. The
following picture illustrates the class diagram that the script follows in order
to perform its functionality and give the map of the world with the points:

27
In this case, we have a class, whose name is “Country” with the attributes
name, count, latitude, longitude and its constructor. However, the class “Map”
has the attributes “fileEvents” and “fileCodeISOCountries”, both
strings. The methods are:
1. getFileEvents: it returns the name of the fileEvents.
2. getFileCodeISOCountries: it gets the file with the name of the countries
and their longitude and latitude (geolocation).
3. obtainDistinctCountries: it returns a new list without duplicates of
the countries.
4. countCountries: this function will count the countries from the list.
5. printCountriesCount: this function will print out the name of the
countries and its ocurrence in the command prompt.
6. getOcurrences: this function will count the times that each country
appears in the file event.txt.
7. loadData: this function makes a data frame (data table) with the points to
show on the map.
8. createMap: it generates the map itself with the bubbles around it in red
colour.
9. obtainLongitudeLatitude: it obtains the longitude and latitude of each
country identified using the file iso3166-1-alpha-2.txt. This file is
allocated in the Appendix section.

28
6.4 GENERAL OVERVIEW OF THE STUDY
Summarizing, the following picture represents the general schema that we are going
to follow in this study. First of all, we will take diverse external sources related with
threat feeds. Then, we will use the platform IntelMQ, which is based in graphs, in
order to correlate the external feeds that we have selected. Then, we will use the
output of the platform (events.txt), which contains all the correlated thread feeds,
as an input of two python scripts in order to perform a statistical analysis and some
data visualization. Finally, we will provide an evaluation of the results and a
conclusion of the study:

29
7 EVALUATION AND RESULTS

7.1 REQUIREMENTS
IntelMQ and the Python scripts need a specific hardware and software requirements
in order to install it and work properly.

7.1.1 Hardware requirements


1. 2 GHz dual core processor or higher.
2. 2 GB system memory.
3. 25 GB of free hard drive space.
4. Either a DVD drive or a USB port for the installer media.
5. Internet access.

7.1.2 Software requirements


IntelMQ can be installed in any of the following operating systems:
1. CentOS 7.
2. Debian 8 and 9.
3. OpenSUSE Leap 42.2 and 42.3.
4. Ubuntu: 14.04, 16.04, 17.10, 18.04 or higher versions.

In this context, we are going to use Ubuntu 16.04 LTS to install the platform.

7.2 INSTALLATION
Once the requirements have been accomplished, we should install the necessary
dependences by typing the following commands in the command prompt:
apt-get install python3 python3-pip
apt-get install git build-essential libffi-dev
apt-get install python3-dev
apt-get install redis-server
apt install python3-pip python3-dnspython python3-psutil python3-redis python3-
requests python3-termstyle python3-tz
apt install git redis-server

After that, we execute the following lines:

30
sudo sh -c "echo 'deb
http://download.opensuse.org/repositories/home:/sebix:/intelmq/xUbuntu_18.04/ /' >
/etc/apt/sources.list.d/home:sebix:intelmq.list"

wget -nv
https://download.opensuse.org/repositories/home:sebix:intelmq/xUbuntu_18.04/Release.k
ey -O Release.key

sudo apt-key add - < Release.key

sudo apt-get update

sudo apt-get install intelmq

Note that the previous commands can take some time. In this context, we have
downloaded successfully the installation of the files and folders that IntelMQ needs
(/etc and /var). However, it is necessary to include extra commands to install the
graphic interface (IntelMQ Manager). For that, we have to install the following
dependences:

apt-get install git apache2 php libapache2-mod-php7.0

After this, we should type the following lines in the command prompt:
sudo sh -c "echo 'deb
http://download.opensuse.org/repositories/home:/sebix:/intelmq/xUbuntu_18.04/ /' >
/etc/apt/sources.list.d/home:sebix:intelmq.list"

wget -nv
https://download.opensuse.org/repositories/home:sebix:intelmq/xUbuntu_18.04/Release.k
ey -O Release.key

sudo apt-key add - < Release.key

sudo apt-get update

sudo apt-get install intelmq-manager

We will be asked for a username and a password during the installation. After this,
IntelMQ Manager has been installed in our computer. At this point, we will able to
access to each directory using the command prompt.

31
Moreover, we can access to the platform using the web-browser by typing localhost:

32
7.3 UTILIZATION OF THE SOFTWARE

7.3.1 Command line


There are two tools for handling the IntelMQ platform: intelmqctl and
intelmqdump.

7.3.1.1 Intelmqctl

Intelmqctl is the main tool for managing the platform. We will focus on the basic
activities that this command can provide us (start, stop, status, restart,
reload and list):

7.3.1.1.1 Manage individual bots


• Start: it takes the bot-id and initializes the bot. Example: intelmqctl start
Spamhaus-Drop-Collector.
• Stop: If the bot is running, this action stops it. If the bot was not running, it
gives a message indicating that the bot is already stopped. Example intelmqctl
stop Spamhaus-Drop-Collector.
• Status: Checks for the PID (process identification) file and if the process with
the given PID is alive. intelmqctl status Spamhaus-Drop-Collector.
• Restart: Stop the bot and then start consecutively. Example intelmqctl
restart Spamhaus-Drop-Collector.
• Reload: it allows to reload the configuration if we have done any change on the
bot.

7.3.1.1.2 Manage the botnet


• Start: initializes the whole botnet. Example intelmqctl start.
• Stop: stops all the bots of the graph. Example: intelmqctl stop.
• Status: we can observe the status of all configured bots. Example: intelmqctl
status.
• Restart: Stop the bot and then start consecutively each bot of the graph.
Example: intelmqctl restart.

33
7.3.1.1.3 List bots
• Intelmqctl list bots: it shows the id of the bots which are in the current
botnet.

7.3.1.1.4 List queues


• Intelmqctl list queues: it shows the number of cyber threats which
IntelMQ has to process (queues).

7.3.1.1.5 Help
There is an option, which is intelmqctl –h, which provide us a list of commands
that we can perform:

$ intelmqctl -h
usage: intelmqctl [-h] [-v] [--type {text,json}] [--quiet]

{list,check,clear,log,run,help,start,stop,restart,reload,status,enable,disable}
...

description: intelmqctl is the tool to control intelmq system.

Outputs are logged to /opt/intelmq/var/log/intelmqctl

optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--type {text,json}, -t {text,json}
choose if it should return regular text or other
machine-readable
--quiet, -q Quiet mode, useful for reloads initiated scripts like
logrotate

subcommands:
{list,check,clear,log,run,help,start,stop,restart,reload,status,enable,disable}
list Listing bots or queues
check Check installation and configuration
clear Clear a queue
log Get last log lines of a bot
run Run a bot interactively
check Check installation and configuration
help Show the help
start Start a bot or botnet
stop Stop a bot or botnet
restart Restart a bot or botnet

34
reload Reload a bot or botnet
status Status of a bot or botnet
enable Enable a bot
disable Disable a bot

intelmqctl [start|stop|restart|status|reload] --group


[collectors|parsers|experts|outputs]
intelmqctl [start|stop|restart|status|reload] bot-id
intelmqctl [start|stop|restart|status|reload]
intelmqctl list [bots|queues|queues-and-status]
intelmqctl log bot-id [number-of-lines [log-level]]
intelmqctl run bot-id message [get|pop|send]
intelmqctl run bot-id process [--msg|--dryrun]
intelmqctl run bot-id console
intelmqctl clear queue-id
intelmqctl check

Starting a bot:
intelmqctl start bot-id
Stopping a bot:
intelmqctl stop bot-id
Reloading a bot:
intelmqctl reload bot-id
Restarting a bot:
intelmqctl restart bot-id
Get status of a bot:
intelmqctl status bot-id

Run a bot directly for debugging purpose and temporarily leverage the logging level
to DEBUG:
intelmqctl run bot-id
Get a pdb (or ipdb if installed) live console.
intelmqctl run bot-id console
See the message that waits in the input queue.
intelmqctl run bot-id message get
See additional help for further explanation.
intelmqctl run bot-id --help

Starting the botnet (all bots):


intelmqctl start
etc.

Starting a group of bots:


intelmqctl start --group experts
etc.

Get a list of all configured bots:


intelmqctl list bots

35
Get a list of all queues:
intelmqctl list queues
If -q is given, only queues with more than one item are listed.

Get a list of all queues and status of the bots:


intelmqctl list queues-and-status

Clear a queue:
intelmqctl clear queue-id

Get logs of a bot:


intelmqctl log bot-id number-of-lines log-level
Reads the last lines from bot log.
Log level should be one of DEBUG, INFO, ERROR or CRITICAL.
Default is INFO. Number of lines defaults to 10, -1 gives all. Result
can be longer due to our logging format!

Outputs are additionally logged to /opt/intelmq/var/log/intelmqctl

7.3.1.2 Intelmqdump
When bots are failing due to programming errors or bad input, they can write the
problematic message to a .dump file. As we have explained previously, these dump
files are saved at the directory: /opt/intelmq/var/log/[botid].dump with a
JSON format. In this context, intelmqdump is an interactive tool to show these
dumped files and the number of dumps per file as well. The following screenshot
represents an example of the functionality of this file:

The number means that, when we have executed the program. In this example,
there are 1287 bad input data for the bot abusech-domain-parser. In particular,
most of the errors are due to the program cannot obtain a parameter defined in the
expert bots from the data that we are using.

36
7.3.1.2.1 Help
There is the possibility to obtain a list of the actions that this platform can perform
by typing intelmqdump – h:

$ intelmqdump -h
usage:
intelmqdump [botid]
intelmqdump [-h|--help]

intelmqdump can inspect dumped messages, show, delete or reinject them into
the pipeline. It's an interactive tool, directly start it to get a list of
available dumps or call it with a known bot id as parameter.

positional arguments:
botid botid to inspect dumps of

optional arguments:
-h, --help show this help message and exit

Interactive actions after a file has been selected:


- r, Recover by IDs
> r id{,id} [queue name]
> r 3,4,6
> r 3,7,90 modify-expert-queue
The messages identified by a consecutive numbering will be stored in the
original queue or the given one and removed from the file.
- a, Recover all
> a [queue name]
> a
> a modify-expert-queue
All messages in the opened file will be recovered to the stored or given
queue and removed from the file.
- e, Delete entries by IDs
> e id{,id}
> e 3,5
The entries will be deleted from the dump file.
- d, Delete file
> d
Delete the opened file as a whole.
- s, Show by IDs
> s id{,id}
> s 0,4,5
Show the selected IP in a readable format. It's still a raw format from
repr, but with newlines for message and traceback.
- q, Quit
> q

37
$ intelmqdump
id: name (bot id) content
0: abusech-domain-parser 1287 dumps

7.3.2 IntelMQ Manager


IntelMQ runs on a server and we can access to it using a web browser. The initial
screen shows the possibilities that we have with this tool: configuration,
management, monitor and about:

1. Configuration: to either change the currently deployed configuration or to


create a new one in a graphical fashion.
2. Management: this is the site where the user can start/stop the bots or check
their status.
3. Monitor: this option is meant to allow the user to check on the overall status
of the botnet. We can read the bot logs, see how the queues are behaving and
other features that allow the user has a better overview of the overall health
of the system.

38
4. About: this place is used to learn and read more about the project’s goals and
the contributions.

7.3.2.1 Configuration
It allows to perform operations (add, edit, clear, delete) with the nodes and edges.
Moreover, we can redraw the botnet, clear configuration and save all the changes
that we have done:

7.3.2.1.1 Add:

We are going to add nodes (collector, parser, expert and output) in the IntelMQ
platform using IntelMQ Manager in order to create the whole botnet that we are
going to use in our study.

7.3.2.1.1.1 Adding collector bots:


In this opportunity, we are going to add all the collectors that we are going to use in
this research. The process is the same for each one of them. The only difference is
that we should change the id of the bot, name and the URL. For that, we have to
click to the “Add node” section and then select the “Generic URL Fetcher type”. Once
this operation is done a window appears in the screen and we have to fill the gaps:

39
7.3.2.1.1.1.1 Spamhaus-Drop-Collector

The runtime.conf file is also updated:


"spamhaus-drop-collector": {
"description": "",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"name": "Spamhaus Drop",
"parameters": {
"feed": "Spamhaus Drop",
"http_password": null,
"http_url": "https://www.spamhaus.org/drop/drop.txt",
"http_username": null,

40
"provider": "Spamhaus",
"rate_limit": 3600,
"ssl_client_certificate": null
},
"enabled": true,
"run_mode": "continuous"
},

7.3.2.1.1.1.2 Abusech-Feodo-Ip-Collector
The only attributes that we should change are the id, description, feed,
provider and http_url. Therefore, we will provide the following table:

Generic Id abusech-feodo-ip-collector

Description Abuse.ch Feodo IP

Runtime Feed Abuse.ch Feodo IP

Provider Abuse.ch

Http_url https://feodotracker.abuse.ch/blocklist/?download=ipblocklist

Lines added to the runtime.conf file:


"abusech-feodo-ip-collector": {
"parameters": {
"feed": "Abuse.ch Feodo IP",
"provider": "Abuse.ch",
"http_url":
"https://feodotracker.abuse.ch/blocklist/?download=ipblocklist",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Abuse.ch Feodo IP",
"enabled": true,
"run_mode": "continuous"
},

41
7.3.2.1.1.1.3 Abusech-Zeus-Baddomains-Collector
Generic Id abusech-zeus-baddomains-collector
Description Generic URL Fetcher is the bot responsible to get the report
from an URL.
Runtime Feed Abuse.ch Zeus Collector
Provider Abuse.ch
Http_url https://zeustracker.abuse.ch/blocklist.php?download=baddomains

New lines in runtime.conf file:


"abusech-zeus-baddomains-collector": {
"parameters": {
"feed": "Abuse.ch Zeus Bad Domains",
"provider": "Abuse.ch",
"http_url":
"https://zeustracker.abuse.ch/blocklist.php?download=baddomains",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Generic URL Fetcher is the bot responsible to get the report
from an URL.",
"enabled": true,
"run_mode": "continuous"
},

7.3.2.1.1.1.4 Abusech-Zeus-Domainblocklist-Collector
Generic Id abusech-zeus-domainblocklist
Description Zeus Tracker
Runtime Feed Abuse.ch Zeus Domain
Provider Abuse.ch
Http_url https://zeustracker.abuse.ch/blocklist.php?do

42
The runtime.conf file is updated:
"abusech-zeus-domainblocklist-collector": {
"parameters": {
"feed": "Abuse.ch Zeus Domain Block List",
"provider": "Abuse.ch",
"http_url":
"https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Zeus Tracker",
"enabled": true,
"run_mode": "continuous"
},

7.3.2.1.1.1.5 PhishTank-Collector
Generic Id Phishtank-collector

Description Generic URL Fetcher is the bot responsible to get the report
from an URL.
Runtime Feed Phishtank

Provider Phishtank-Collector

Http_url https://www.phishtank.com/developer_info.php

In this case, we have the following lines in the runtime.conf file:


"phishtank-collector": {
"parameters": {
"feed": "Phishtank csv",
"provider": "Phishtank ",
"http_url": "http://data.phishtank.com/data/online-valid.csv",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,

43
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Generic URL Fetcher is the bot responsible to get the
report from an URL.",
"enabled": true,
"run_mode": "continuous"
},

7.3.2.1.1.1.6 Malware-Domain-List-Collector
Generic Id Malware-domain-list-collector

Description Generic URL Fetcher

Runtime Feed Malware Domain List

Provider Malware Domain List

Http_url http://www.malwaredomainlist.com/mdl.php

The new lines added are the following:


"malware-domain-list-collector": {
"parameters": {
"feed": "Malware Domain List",
"http_url": "http://www.malwaredomainlist.com/mdlcsv.php",
"provider": "Malware Domain List",
"rate_limit": 3600
},
"description": "Malware Domain List Collector is the bot responsible to get
the report from source of information.",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"name": "Malware Domain List",
"enabled": true,
"run_mode": "continuous"
}

44
7.3.2.1.1.1.7 Malc0de-Windows-Format-Collector
Generic Id Malc0de-windows-format-collector

Description Generic URL Fetcher is the bot responsible to get the report
from an URL
Runtime Feed Generic Fetcher

Provider Malc0de

Http_url http://malc0de.com/bl/BOOT

New lines in runtime.conf file:


"malc0de-windows-format-collector": {
"description": "",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"name": "Malc0de Windows Format",
"parameters": {
"feed": "Generic URL Fetcher is the bot responsible to get the report
from an URL.",
"http_password": null,
"http_url": "https://malc0de.com/bl/BOOT",
"http_username": null,
"provider": "Malc0de",
"rate_limit": 10800,
"ssl_client_certificate": null
},
"enabled": true,
"run_mode": "continuous"
},

The main screen of IntelMQ Manager has the following aspect:

45
7.3.2.1.1.2 Adding parser bots:
We have to create a parser bot each collector bot. For that, we click in the Add
button and we generate the parser bot using the graphical interface. The last step is
the create a relationship between these two nodes by clicking “add edge”.

7.3.2.1.1.2.1 Spamhaus-Drop-Parser

File runtime.conf updated:


"spamhaus-drop-parser": {
"description": "Spamhaus Drop Parser is the bot responsible to parse the
DROP, EDROP, DROPv6, and ASN-DROP reports and sanitize the information.",
"group": "Parser",
"module": "intelmq.bots.parsers.spamhaus.parser_drop",
"name": "Spamhaus Drop",
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},

46
7.3.2.1.1.2.2 Abusech-Ip-Parser

The hile runtime.conf was updated:


"Abusech-IP-Parser": {
"parameters": {},
"name": "Abuse.ch IP",
"group": "Parser",
"module": "intelmq.bots.parsers.abusech.parser_ip",
"description": "Abuse.ch IP Parser is the bot responsible to parse the
report and sanitize the information.",
"enabled": true,
"run_mode": "continuous"
},

47
7.3.2.1.1.2.3 Abusech-Domain-Parser

File runtime.conf updated:


"abusech-domain-parser": {
"description": "Abuse.ch Domain Parser is the bot responsible to parse
the report and sanitize the information.",
"group": "Parser",
"module": "intelmq.bots.parsers.abusech.parser_domain",
"name": "Abuse.ch Domain",
"parameters": {},
"enabled": true,
"run_mode": "continuous" },

48
7.3.2.1.1.2.4 PhishTank-Parser

File runtime.conf updated:


"PhishTank-Parser": {
"parameters": {},
"name": "PhishTank",
"group": "Parser",
"module": "intelmq.bots.parsers.phishtank.parser",
"description": "PhishTank Parser is the bot responsible to parse the
report and sanitize the information.",
"enabled": true,
"run_mode": "continuous"
},

49
7.3.2.1.1.2.5 Malware-Domain-List-Parser

File runtime.conf updated:


"malware-domain-list-collector": {
"parameters": {
"feed": "Malware Domain List",
"http_url": "http://www.malwaredomainlist.com/mdlcsv.php",
"provider": "Malware Domain List",
"rate_limit": 3600
},
"description": "Malware Domain List Collector is the bot responsible to
get the report from source of information.",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"name": "Malware Domain List",
"enabled": true,
"run_mode": "continuous" },

50
7.3.2.1.1.2.6 Malc0de-parser

File runtime.conf updated:


"malc0de-parser": {
"description": "Malc0de Parser is the bot responsible to parse the IP
Blacklist and either Windows Format or Bind Format reports and sanitize the
information.",
"group": "Parser",
"module": "intelmq.bots.parsers.malc0de.parser",
"name": "Malc0de",
"parameters": {},
"enabled": true,
"run_mode": "continuous" },

The next image represents the bots that we have created so far:

51
7.3.2.1.1.3 Adding expert bots:
Being at this point, we are going the deduplicator-expert bot. For that, we click in
the menu allocated to the left and we select experts, deduplicator expert. Once we
have clicked, we accept the default configuration and we add relationships between
the parser nodes and the expert nodes that we have created:

52
7.3.2.1.1.3.1 Deduplicator-Expert

If we look at the file runtime.conf, we can observe that was updated:


"deduplicator-expert": {
"description": "Deduplicator is the bot responsible for detection and removal
of duplicate messages. Messages get cached for <redis_cache_ttl> seconds. If found in
the cache, it is assumed to be a duplicate.",
"group": "Expert",
"module": "intelmq.bots.experts.deduplicator.expert",
"name": "Deduplicator",
"parameters": {
"filter_keys": "raw,time.observation",
"filter_type": "blacklist",
"redis_cache_db": 6,
"redis_cache_host": "127.0.0.1",
"redis_cache_password": null,
"redis_cache_port": 6379,
"redis_cache_ttl": 86400
},
"enabled": true,
"run_mode": "continuous" },

53
7.3.2.1.1.3.2 Taxonomy-Expert

In this case, the file runtime.conf is also updated:


"taxonomy-expert": {
"description": "Taxonomy is the bot responsible to apply the eCSIRT Taxonomy
to all events.",
"group": "Expert",
"module": "intelmq.bots.experts.taxonomy.expert",
"name": "Taxonomy",
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},

54
7.3.2.1.1.3.3 Url2fqdn-expert

With the previous change, the file was updated:


"url2fqdn-expert": {
"parameters": {
"overwrite": false
},
"name": "url2fqdn",
"group": "Expert",
"module": "intelmq.bots.experts.url2fqdn.expert",
"description": "url2fqdn is the bot responsible to parsing the fqdn from the
url.",
"enabled": true,
"run_mode": "continuous"
},

55
7.3.2.1.1.3.4 GethostbyName-2-expert and GethostbyName-1-expert

File runtime.conf updated:


"gethostbyname-1-expert": {
"parameters": {},
"name": "Gethostbyname",
"group": "Expert",
"module": "intelmq.bots.experts.gethostbyname.expert",
"description": "fqdn2ip is the bot responsible to parsing the ip from the
fqdn.",
"enabled": true,
"run_mode": "continuous"
},

56
7.3.2.1.1.3.5 Cymru-Whois-Expert

File runtime.conf updated:


"cymru-whois-expert": {
"description": "Cymry Whois (IP to ASN) is the bot responsible to add network
information to the events (BGP, ASN, AS Name, Country, etc..).",
"group": "Expert",
"module": "intelmq.bots.experts.cymru_whois.expert",
"name": "Cymru Whois",
"parameters": {
"redis_cache_db": 5,
"redis_cache_host": "127.0.0.1",
"redis_cache_password": null,
"redis_cache_port": 6379,
"redis_cache_ttl": 86400
},
"enabled": true,
"run_mode": "continuous" },

57
The following picture, represents the nodes that we have created so far:

7.3.2.1.1.4 Adding an output bot


The next step is to add an output bot in order to observe the results in some file and
therefore, correlate the cyber threats obtained from the different websites. In this
case, we accept the values by default.

58
The file runtime.conf is updated:
"file-output": {
"description": "File is the bot responsible to send events to a file.",
"group": "Output",
"module": "intelmq.bots.outputs.file.output",
"name": "File",
"parameters": {
"file": "/opt/intelmq/var/lib/bots/file-output/events.txt",
"hierarchical_output": false
},
"enabled": true,
"run_mode": "continuous"
},

The picture which are allocated below corresponds to the graph that we have
generated so far:

7.3.2.1.2 Add edge


At this point, we should aggregate edges to our botnet. For that, we need to click
“Add Edge” from the menu which are allocated at the top of the screen. Once, we
have finish, this will be the final aspect of the whole botnet:

59
The changes is also applied in the pipeline.conf file (we can see its contents in
the Appendix section).

7.3.2.1.3 Edit, Clear and Delete


In order to edit, clear and delete we have to select a node. Then, we can do these
operations and we can see the changes in the botnet using the graphical way:

60
7.3.3 Management
It allows the possibility to manage the individual bots or manage the whole botnet
using the operations that we have commented before: start, stop, restart and
reload. In addition, it lists all the current bots with their states. Initially, all the

bots (the whole botnet) are stopped:

7.3.4 Monitor
It shows the number of cyber threats which IntelMQ has to process in a graphical
way (queues):

61
7.4 OBTAINING FILE OUTPUT:
There are two ways to get the file output: by command lines or by IntelMQ Manager.

7.4.1 Command lines


Firstly, we have to type the commands: sudo su and then su – intelmq to access
to the directories where Intelmq has their configuration files and the folders that
the platform uses in order to work properly. Then, we can start the whole botnet
using intelmqctl start:

We can see the status of each bot if we introduce in the command line intelmqctl
status:

62
We can use the command prompt in order to watch the queues as well. In fact, the
following image shows us a piece of the result that we can see in the command
prompt:

The file output that we are looking for is allocated in the file /var/lib/bots/file-
output.

7.4.2 IntelMQ Manager

Having said that, let’s run the whole botnet in order to obtain the file output and
perform some visualizations later on. For that, we click the button on the
management main screen. Note that all the bots are in green, which indicates that
each bot is running:

63
If we pop up on the monitor button, we will see the queues per bot. Note that the
number indicates the number of lines that each bot has to process in order to
perform the correlation:

This operation takes some time, because we are processing large amounts of data
from several external sources. Moreover, if we pop up one bot, we can observe the

64
logs messages. The picture below represents an example of this. It shows the logs
messages of abuse-domain-parser-queue:

These messages are also allocated in the directory /opt/intelmq/var/log. In this


case, we are visualizing the file /opt/intelmq/var/log/abusech-domain-
parser.log.

As we can observe, there is no errors. After this, we have obtained the file in the
folder /var/lib/bots/file-output where we can look at each correlated cyber
thread. The file is also available on the remote repository and its URL is:
https://github.com/jgfc1/ThesisRepository/blob/master/Map%20World/events.txt
The following image represents an example of the file, whose name is events.txt:

65
{"feed.accuracy": 100.0, "feed.name": "Abuse.ch Zeus Bad Domains", "feed.provider":
"Abuse.ch", "feed.url":
"https://zeustracker.abuse.ch/blocklist.php?download=baddomains", "time.observation":
"2018-04-15T16:37:27+00:00", "classification.taxonomy": "malicious code",
"classification.type": "c&c", "source.fqdn": "afobal.cl", "raw": "YWZvYmFsLmNs",
"malware.name": "zeus", "source.ip": "66.7.198.165", "source.asn": 33182,
"source.network": "66.7.192.0/19", "source.geolocation.cc": "US", "source.registry":
"ARIN", "source.allocated": "2006-05-18T00:00:00+00:00", "source.as_name": "DIMENOC -
HostDime.com, Inc., US"}

{"feed.accuracy": 100.0, "feed.name": "Phishtank csv", "feed.provider": "Phishtank",


"feed.url": "http://data.phishtank.com/", "time.observation": "2018-04-
15T16:37:31+00:00", "source.url": "https://dk-
media.s3.amazonaws.com/media/1od86/downloads/319542/gtwd.html",
"event_description.url": "http://www.phishtank.com/phish_detail.php?phish_id=4778533",
"time.source": "2017-01-30T18:55:20+00:00", "event_description.target": "Microsoft",
"classification.type": "phishing", "raw":
"NDc3ODUzMyxodHRwczovL2RrLW1lZGlhLnMzLmFtYXpvbmF3cy5jb20vbWVkaWEvMW9kODYvZG93bmxvYWRzLz
MxOTU0Mi9ndHdkLmh0bWwsaHR0cDovL3d3dy5waGlzaHRhbmsuY29tL3BoaXNoX2RldGFpbC5waHA/cGhpc2hfa
WQ9NDc3ODUzMywyMDE3LTAxLTMwVDE4OjU1OjIwKzAwOjAwLHllcywyMDE3LTA1LTA5VDA5OjM1OjIxKzAwOjAw
LHllcyxNaWNyb3NvZnQ=", "classification.taxonomy": "fraud", "source.fqdn": "dk-
media.s3.amazonaws.com", "source.ip": "52.216.97.11", "source.asn": 16509,
"source.network": "52.216.97.0/24", "source.geolocation.cc": "US", "source.registry":
"ARIN", "source.allocated": "2015-09-02T00:00:00+00:00", "source.as_name": "AMAZON-02 -
Amazon.com, Inc., US"}

In the previous example, we can observe the following parameters with its specific
value:
• Feed.accuracy: It is a decimal number between 0 and 100 that represents how

accurate is the information that we have obtained from the external sources.
• Feed.name: It corresponds to the name of the feed.

• Feed.provider: It describes the principal name of the provider.

• Feed.url: it provides the link for each specific feed.

• Time.observation: It corresponds to the time in which the source bot have seen the

event (threat).
• Classification.taxonomy: Cyber threats can be grouped using a specific
classification which the European Union Agency for Network and Information
Security has established.

66
• Classification.type: once the program has classified the threat feed using its

taxonomy; it defines the type. The following table illustrates these two types of
classification:

Classification Classification Description and examples


taxonomy type

Spam is unsolicited email, usually with


Spam advertising content, which is sent plenty of
times.
Abusive
Content Discrimination of somebody. For instance,
Harmful Speech
threats against people or cyber stalking.

Child/sexual/violence Child pornography, glorification of violence.

Virus
Trojan
Software that is included or inserted in a
Worm computer system to harm or damage to the
Malicious code
Spyware final user.
Dialler
Rootkit

The purpose of this kind of threat is to send


Scanning requests to a system to discover weak points
and therefore perform the attack.

Information
Sniffing Recording and observing network traffic.
Gathering

It gathers data from the users being in a non-


Social Engineering technical way. For instance: bribes, tricks,
lies…

67
They try to disrupt a service or compromise a
Exploiting known
system by exploiting vulnerabilities with a
vulnerabilities
specific identifier.

Intrusion This kind of attacks try to access to the

Attempts Login attempts account of the user by guessing, cracking of


passwords or brute force.

New attack They use unknown exploits in order to perform


signature the intrusion.

Privileged account
compromise These threats consist on a successful
Unprivileged compromise of a system or application
account (service). In fact, it could have been caused
Intrusions
compromised remotely by a new vulnerability, unauthorized

Application local access, by a known or includes being part


of a botnet.
compromise
Bot

Dos
In this case, a system is bombarded with
DDoS several packets. As a result, the operations
Availability
could be delayed or the system could be
Sabotage
crashed.
Outage (no malice)

Unauthorized access
Information to information This kind of the attacks intercept and access
Content Unauthorized information during transmission (wiretapping,

Security modification of spoofing or hijacking) are possible.

resources

68
Unauthorized use of It uses sources for unauthorized purposes. For

resources example: profit-making ventures.

Copyright It sells and install copies of copyright protected


materials.
Fraud
In this case, an entity illegitimately assumes
the identity of another in order to benefit from
Masquerade
it

It is an entity in order to persuade the user to


Phishing reveal a private credential.

Open resolvers, world readable printers,


Vulnerable Open for abuse virus…

The incidents which are not listed in one of the previous classes should
Other
be here.

Test Meant for testing

• Source.fqdn: It corresponds to the DNS name related to the host from which the

connection originated.
• Raw: It is a line of the event from encoded in base64.

• Malware.name: A malware family name of the threat.

• Source.ip: The IP direction observed.

• Source.asn: The autonomous system number from which originated the

connection.
• Source.network: It uses the CIDR (BGP prefix) system in order to provide the

source.network.
• Source.geolocation.cc: It provides the country-code using the ISO3166-1

alpha-2 for the IP which IntelMQ has identified.

69
• Source.registry: It indicates the IP registry in which a given IP address is

allocated.
• Source.allocated: Allocation date corresponding to BGP prefix.

• Source.as_name: It represents the autonomous system name from which the

connection created.
• Event_description.target: It gives the target (organization) of an attack.

In fact, in the harmonization.conf is allocated all the possible attributes that


IntelMQ can correlate. We can access to this file throughout this website and
observe the possible attributes which the file-output could have:
https://github.com/jgfc1/ThesisRepository/blob/master/IntelMQ/harmonization.conf

7.5 STATISTICAL ANALYSIS


Once we have obtained the file output using IntelMQ, the next step is to provide
some visualizations for interpreting the information and then extract some patterns.
Basically, we are going to perform three types of visualizations. On the one hand, a
map of the world, which indicates where the attacks come from. On the other hand,
we are going to study the data by creating table frequencies to provide pie charts in
some cases. The distribution or table of frequencies is a table of the statistical data
with its corresponding frequencies. In this study, we are going to focus in the
absolute frequency, which is the number of times that a value appears (in our case,
the value will be a cyber threat). Normally, it is represented by fi where i represents
each of the values. The total number of data, will be the sum of absolute frequencies
(N). Furthermore, the tables will provide the relative frequency, which is the result
of dividing fi by N. With this number, we can compute percentage (%) of each threat
identified.

7.5.1 Map of the world


The map of the world is generated when we execute the script generateMap.py
using the following syntax in the command line: 𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑀𝑎𝑝. 𝑝𝑦

70
The result of the program is a html file which contains the map of the world using
bubbles or circles with a specific size. The file is called mymap.html and we can
show its content through this link:
https://github.com/jgfc1/ThesisRepository/blob/master/Map%20World/mymap.html

The above image illustrates that north America is the place where there are more
originated attacks. In fact, United States (E.E.U.U) is the country which performs a
large amount of cyber threats in our study (56,041%) followed by Canada (2,156%).
In contrast, in south America there are less quantity of generated attacks in which
Brazil (1,257%) is the country which performs the major part of cyber threats.

In second place, we have Europe with the countries Netherlands (6,667%), Germany
(2,898%), France (2,874%), Italy (1,995%) and Poland (1,411%) that makes more
cyber threats. There are other countries such as Romania (1,147%), United Kingdom
(1,131%), Ukraine (0,848%), Spain (0,457%) and Ireland (0,387%) which produces
less attacks in comparison with the previous places.

71
In addition to this, Russian Federation (3,362%) performs a large quantity of cyber
threats as well as the rest of Asian countries: China (0,968%) and India (0,877%). In
fourth place, we have Oceania continent, in which Australia (2,005%) and Indonesia
(1,061%) produces the major part of the attacks. South Africa (0,454%) is the
country that generates more attacks in Africa. The next table resumes where the
attacks come from with their respective frequencies:

ni
Where the attack %
Position Country 𝑓/
comes from 𝑛/ = 𝑛/ · 100
𝑁
1 United States 43205 0,56041 56,041
2 Netherlands 5140 0,06667 6,667
3 Russian Federation 2592 0,03362 3,362
4 Germany 2234 0,02898 2,898
5 France 2216 0,02874 2,874
6 Canada 1662 0,02156 2,156
7 Australia 1546 0,02005 2,005
8 Italy 1538 0,01995 1,995
9 Poland 1088 0,01411 1,411
10 Brazil 969 0,01257 1,257
11 Romania 884 0,01147 1,147
12 United Kingdom 872 0,01131 1,131
13 Indonesia 818 0,01061 1,061
14 Turkey 810 0,01051 1,051
15 China 746 0,00968 0,968
16 Bulgaria 732 0,00949 0,949
17 India 676 0,00877 0,877
18 Ukraine 654 0,00848 0,848
19 Singapore 514 0,00667 0,667
20 Hong Kong 508 0,00659 0,659
21 Chile 432 0,00560 0,560

72
22 Vietnam 394 0,00511 0,511
23 Taiwan 372 0,00483 0,483
24 Spain 352 0,00457 0,457
25 South Africa 350 0,00454 0,454
26 Czech Republic 322 0,00418 0,418
27 Republic of Korea 322 0,00418 0,418
28 Sweden 308 0,00400 0,400
29 Ireland 298 0,00387 0,387
30 Switzerland 272 0,00353 0,353
31 Portugal 262 0,00340 0,340
32 Belarus 246 0,00319 0,319
33 Argentina 222 0,00288 0,288
34 Hungary 212 0,00275 0,275
35 Thailand 210 0,00272 0,272
36 Lithuania 198 0,00257 0,257
37 Japan 180 0,00233 0,233
38 Israel 172 0,00223 0,223
39 Bangladesh 166 0,00215 0,215
40 Malaysia 166 0,00215 0,215
41 Georgia 136 0,00176 0,176
42 Serbia 136 0,00176 0,176
43 Finland 124 0,00161 0,161
44 Peru 118 0,00153 0,153
45 Iran 108 0,00140 0,140
46 Latvia 106 0,00137 0,137
47 Denmark 100 0,00130 0,130
48 Iceland 98 0,00127 0,127
49 New Zealand 86 0,00112 0,112
50 Greece 78 0,00101 0,101
51 Slovenia 72 0,00093 0,093

73
52 Kenya 62 0,00080 0,080
53 Austria 62 0,00080 0,080
54 Luxembourg 62 0,00080 0,080
55 Kazakhstan 62 0,00080 0,080
56 Norway 56 0,00073 0,073
57 Croatia 54 0,00070 0,070
58 Mongolia 52 0,00067 0,067
59 Nigeria 50 0,00065 0,065
60 Tanzania 48 0,00062 0,062
61 Colombia 46 0,00060 0,060
62 United Arab
46 0,00060 0,060
Emirates
63 Slovakia 44 0,00057 0,057
64 Belgium 44 0,00057 0,057
65 Panama 44 0,00057 0,057
66 Mexico 38 0,00049 0,049
67 Macedonia 24 0,00031 0,031
68 Estonia 24 0,00031 0,031
69 Mauritius 24 0,00031 0,031
70 Cyprus 22 0,00029 0,029
71 Gambia 18 0,00023 0,023
72 Moldova 18 0,00023 0,023
73 Egypt 16 0,00021 0,021
74 Morocco 14 0,00018 0,018
75 Ecuador 14 0,00018 0,018
76 Bosnia and
12 0,00016 0,016
Herzegovina
77 Costa Rica 10 0,00013 0,013
78 Uruguay 8 0,00010 0,010
79 Saudi Arabia 8 0,00010 0,010
80 Pakistan 6 0,00008 0,008

74
81 Uzbekistan 6 0,00008 0,008
82 Azerbaijan 6 0,00008 0,008
83 El Salvador 6 0,00008 0,008
84 Sri Lanka 6 0,00008 0,008
85 Venezuela 4 0,00005 0,005
86 Albania 4 0,00005 0,005
87 Philippines 4 0,00005 0,005
89 Paraguay 4 0,00005 0,005
90 Palestinian 4 0,00005 0,005
Territory
91 Mali 4 0,00005 0,005
Libyan Arab 0,00005 0,005
92 4
Jamahiriya
93 Iraq 4 0,00005 0,005
94 Seychelles 2 0,00003 0,003
95 French Polynesia 2 0,00003 0,003
96 Barbados 2 0,00003 0,003
97 Belize 2 0,00003 0,003
98 C dIvoire 2 0,00003 0,003
99 Cuba 2 0,00003 0,003
100 Dominican Republic 2 0,00003 0,003
101 Antigua and 2 0,00003 0,003
Barbuda
102 Guatemala 2 0,00003 0,003
103 Honduras 2 0,00003 0,003
104 Senegal 2 0,00003 0,003
105 Nepal 2 0,00003 0,003
106 Sudan 2 0,00003 0,003
107 Puerto Rico 2 0,00003 0,003
108 Kuwait 2 0,00003 0,003
:

𝑓/ = 𝑓6 𝑓7 + ⋯ + 𝑓: = 𝑁 77096 1 100
/;6

75
7.5.2 Pie charts and tables
The another script is used to generate pie charts and its tables. For that, we need to
execute the program in this way: 𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 < 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 >
In order to obtain distinct results, we should indicate to the program the parameter
that we want from the file output (feed provider, feed name, classification taxonomy,
classification type or the target of the attacks). If we introduce a parameter that
doesn’t exist in the event.txt file, the program will give an error to indicate that we
have written a wrong parameter. Moreover, if we don’t write parameter, the
program will give an error indicating that we should introduce a parameter.

7.5.2.1 Feed provider utilized


In order to obtain a pie chart with this attribute, we should type the following
command in the command prompt:
𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 𝑓𝑒𝑒𝑑. 𝑝𝑟𝑜𝑣𝑖𝑑𝑒𝑟
The next table provides an overview of the feed providers utilized in this research.
As we can see, the major part of the sources come from PhishTank (88.2%). Then, we
can see the other sources, which are Malware Domain List that contains the 5.6% of
the data that we are analysing, Abuse.ch represents the 4.1% of the data and finally
Spamhaus (2.0 %) and Malc0de (0.1 %).

ni
Count %
Feed provider 𝑓/
fi 𝑛/ = 𝑛/ · 100
𝑁
Phishtank 72664 0,882 88,2
Malc0de 4578 0,056 5,6
Spamhaus 3418 0,041 4,1
Abuse.ch 1642 0,020 2,0
Malware Domain List 114 0,001 0,1
:

𝑓/ = 𝑓6 𝑓7 + ⋯ + 𝑓: = 𝑁 82416 1 100,0
/;6

76
The image below represents a pie charts using the previous values of each feed
provider

7.5.2.2 Feed name utilized


In this case, we should type:
𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 𝑓𝑒𝑒𝑑. 𝑛𝑎𝑚𝑒
In this table, we can observe that Phishtank.csv (88.2%) is the most feed utilized in
this study. The other values are: Malware Domain List (5.6%), Abuse.ch Feodo IP
(3.1%), Abuse.ch Zeus Bad Domains (0.1%), Abuse.ch Zeus Domain Block List (0.9%),
Spamhaus Drop (2.0%) and Generic URL Fetcher (0.1%):

ni
Count %
Feed name 𝑓/
fi 𝑛/ = 𝑛/ · 100
𝑁

Phishtank csv 72664 0,882


88,2

Malware Domain List 4578 0,056 5,6

77
Abuse.ch Feodo IP 2582 0,031
3,1

Spamhaus Drop 1642 0,020 2,0


Abuse.ch Zeus Domain Block List 748 0,009 0,9
Generic URL Fetcher 114 0,0014 0,1
Abuse.ch Zeus Bad Domains 88 0,0011 0,1
:

𝑓/ = 𝑓6 𝑓7 + ⋯ + 𝑓: = 𝑁 82416 1 100
/;6

The following chart indicates the values that we have obtained from the previous
table:

7.5.2.3 Classification taxonomy


In order to observe the classification taxonomy pie chart, we should type:
𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛. 𝑡𝑎𝑥𝑜𝑛𝑜𝑚𝑦
In this case, the 88.2% of the data that we have obtained is fraud. The rest of the
percentages are 9.8%, which corresponds to the malicious code and 2.0% is related
with the malicious content.

78
ni
Count %
Taxonomy 𝑓/
fi 𝑛/ = 𝑛/ · 100
𝑁
Fraud 72664 0,882 88,2
Malicious code 8110 0,098 9,8
Abusive content 1642 0,020 2,0
:

𝑓/ = 𝑓6 𝑓7 + ⋯ + 𝑓: = 𝑁 82416 1 100
/;6

7.5.2.4 Event description

As we have seen before, the malicious code can be divided by: virus, Trojan, worm
Spyware, Dialer and Rootkit. The next table represents the count of some of the
types of malicious code that we have obtained. Note that the major part of the
malware identified correspond to Trojans. For that, we should write the following
line in the command prompt.
𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 𝑒𝑣𝑒𝑛𝑡_𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛. 𝑡𝑒𝑥𝑡

79
Position Name of malware Count
1 Trojan.Ransom 332
2 Trojan 298
3 Script.Explot 238
4 Trojan.Zbot 234
5 Win32/FirseriaInstaller.C 198
6 VBS.Trojan.Downloader 156
7 Gateway to EK 132
8 Directs to exploits 126
9 Fake av 112
10 Trojan.FakeAlert 80
11 Leads to Trojan.Banload 68
12 Leads to exploit at jolygoestobeinvester.ru 68
13 Trojan 64
14 iframe on compromised site leads to EK 58
15 exploit kit 58
16 RFI 56
17 Exploit 56
18 Compromised site directs to exploits 52
19 Compromised site (DHL malspam campaign 48
20 Leads to exploit 46
21 Trojan.FakeFlash 40
22 malware calls home 36
23 Trojan.Downloader 32
24 iFrame.Exploit 32
25 Leads to ransomware 32
26 Trojan.Extension.Exploit 30
27 Win32/Trojan.Spy 30
28 Used by malspam to lead victims to 28

80
Trojan.Banload
29 redirects to exploit kit 28
30 Trojan.Backdoor 26
31 Spyware.Zbot 24
32 Compromised site (Natwest malspam campaign 24
33 IE exploit 24
34 trojan OnlineGames 24
35 compromised site leads to exploit kit 22
36 P2PZeus.WebInject 22
37 trojan downloader 22
38 directs to rogue 22
39 Malvertisin 22
40 trojan Banker 20
41 obfuscated script directs to exploits 20
42 20
VBScript.Drive-b

43 Leads to Trojan.Zbot 20
44 Trojan.Zeus.GameOver 18
45 exploit 18
46 Trojan.Banker 18
47 SpyEye C&C 16
48 Trojan.Zeus.GO 16
49 Ransom WindowsSecurity 16
50 Worm.Autorun 16

We can see the entire document on the remote repository on GitHub through this
direction:
https://github.com/jgfc1/ThesisRepository/blob/master/Pie%20Charts/output_classifi
cation_malware.txt

81
7.5.2.5 Classification type
In this case, it is necessary to type the following parameter:
𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛. 𝑡𝑦𝑝𝑒
In the same way, we can classify the cyber threats depending on its type. As we can
observe in the table which is allocated below, the 88.2% of the cyber threats
identified are associated with phishing attacks, the 5.7% is related to malware, the
2.0% is related to spam and finally, the 4.1% is associated with c&c attacks.
ni
Count %
Taxonomy 𝑓/
fi 𝑛/ = 𝑛/ · 100
𝑁
Phishing 72664 0,882 88,2
Malware 4692 0,057 5,7
C&C 3418 0,041 4,1
Spam 1642 0,020 2,0
:

𝑓/ = 𝑓6 𝑓7 + ⋯ + 𝑓: = 𝑁 82416 1 100
/;6

82
7.5.2.6 Target (organization) of the attacks
The parameter that we have to include in this case is:
𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 𝑒𝑣𝑒𝑛𝑡_𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛. 𝑡𝑎𝑟𝑔𝑒𝑡
The next table represents the organizations that have been suffered the threats that
we have identified in this research. Most of them are important companies which
people are using day by day such as bank web sites (PayPal), social networks
(Facebook), cloud platforms (Dropbox), electronic commerce (eBay), email services…
Note that the attribute “Other” could be a firm which are not listed, i.e. small or
medium companies or organizations.

Name of the organization Number of attacks suffered


Other 61532
PayPal 3136
Facebook 2842
Microsoft 1174
Google 610
DropBox 268
eBay 248
Adobe 228
Banco de Brasil 180
AOL 166
Apple 150
Itau 118
Yahoo 118
JPMorgan Chase and Co. 110
Amazon.com 102
Alibaba.com 102
Internal Revenue Service 96
Santander UK 88
Bradesco 86

83
Netflix 84
United Services Automobile Association 70
Steam 70
DHL 60
ASB Bank Limited 54
Wells Fargo 50
Bank of America Corporation 50
ABSA Bank 48
LinkedIn 44

Allegro 36

WhatsApp 34

Orange 34

Barclays Bank PLC 32

American Express 32

Cartasi 28

WalMart 26

Blockchain 26

Capitec Bank 24

NatWest Bank 24

Sulake Corporation 22

Caixa 22

Cielo 18

Visa 18

Allied Bank Limited 18

Hotmail 16

PNC Bank 16

Poste Italiane 16

National Australia Bank 14

HSBC Group 14

84
Australia and New Zealand Banking 12
Group Limited
US Bank 12

Royal Bank of Canada 12

RuneScape 12

Mastercard 10

Citibank 10

Twitter 10

Her Majesty's Revenue and Customs 10

Centurylink 10

Volksbanken Raiffeisenbanken 8

Branch Banking and Trust Company 8

TD Canada Trust 8

Discover Bank 8

MyEtherWallet 6

Capital One 6

Citizens Bank 6

Vodafone 6

TAM Fidelidade 6

Orkut 6

Accurint 6

ING Direct 6

CareerBuilder 6

GitHub 6

American Greetings 6

Lloyds Bank 4

Tesco 4

Key Bank 4

Delta Air Lines 4

85
Suncorp 4

Westpac 4

Standard Bank Ltd. 4

PKO Polish Bank 4

First National Bank (South Africa) 4

Metro Bank 4

CIMB Bank 4

PagSeguro 2

Western Union 2

British Telecom 2

Sky Financial 2

Live 2

Nordea Bank 2

Nets 2
Aetna Health Plans &amp; Dental 2
Coverage
Deutsche Bank 2

Halifax 2

Compass Bank 2

ArenaNet 2

Rackspace 2

BMO Financial 2

Discover Card 2

Craigslist 2

Binance 2

US Airways 2

Development Bank of Singapore 2

ABN AMRO Bank 2

UniCredit 2

86
World of Warcraft 2

Washington Mutual 2

Wachovia 2

EPPICard 2

American Airlines 2

Groupon 2

Alliance Bank 2

TSB 2

Salesforce 2

ZML 2

Smile Bank 2

Bitfinex 2

Royal Bank of Scotland 2

N 72662

87
8 CONCLUSION
Internet is present in daily life. In fact, all the information which are related with
people, organizations or companies are stored over the Internet: bank accounts,
financial and health records… In this interconnected world, cyberattacks have
been increased in the recent years due to the attackers are finding new ways to
target networks in order to access, change, destroy, extorting or interrupting digital
pieces of information over the Internet. Cyber Threat Intelligence, which is a field of
Cybersecurity, provides resources over the Internet which gives a list of malicious
software, bad domains or IP’s among others in order to provide to the cyber analysts
a way to know which are the latest attacks which are being producing. Nevertheless,
the way to provide the data is quite heterogeneous because the information is stored
in different digital formats (csv files, html pages, text files…) and structure. In this
context, it is quite useful to develop some correlations between them in order to
extract some data meaning. For that, we have use an open source tool, whose name
is IntelMQ, for collecting and processing external resources. The platform is based in
a graph (botnet) and a nodes (bots) with a relationship between them in order to
process each threat feed.

Once we have executed the whole botnet, we have obtained a file called
“events.txt”, that corresponds to the output of the program, in which the cyber
threats are correlated using specific attributes with concrete values. After that, we
have performed some visualizations using a couple of scripts written in Python such
as a map of the word with points in each country indicating the size of attacks, pie
charts and tables. Therefore, looking at the results obtained, we can conclude that
the major part of the attacks was originated from north America and Europe, where
the most common malware is the Trojan (in fact, we have shown that there were
plenty of types of Trojans). Moreover, since the major part of the data is obtained
from the external resource Phistank, the most common malware that we have
analysed is Phishing and Fraud.

88
This thesis provides an excellent learning opportunity to expand the knowledge of
cybersecurity by using a platform which cyber analysts employs to track cyber
threats from external sources over the Internet. Cyberattacks activity has been
growing over the years and there is no evidence that this tendency will stop in the
future so an experience in the field of cybersecurity will be a great boon in the future.

As future work, there is the possibility to integrate more bots in whole botnet. For
that, collector bots, parser bots and expert bot should be created and configured. For
instance, the integration of Alien Vault to IntelMQ could be a baseline to detect
more types of attacks. Alien Vault is a digital security management platform that
provides unified and coordinated Security Monitoring, Management and Security
Event Management, Intelligence against Continued Security Threats and multiple
security features in a single console.

89
9 DECLARATION
I hereby certify that the material allocated in this thesis, which has been submitted
at Athlone Institute of Technology (Network Management and Cloud Infrastructure),
is entirely my own work and it has not been submitted for any academic assessment.
Future students may read and use this thesis to learn about the topic that I talked
about or future research.

Date: 16/05/2018 Signed:

90
10 REFERENCES
[1] Briana Gammons. 6 must-know cybersecurity statistics for 2017 from
https://blog.barkly.com/cyber-security-statistics-2017 [online: accessed 10 January
2018]

[2] Symantec. (2017). Internet Security Threat Report (ISTR) Government, vol. 22.
Retrieved September 17, 2017, from
https://www.symantec.com/content/dam/symantec/docs/reports/gistr22- government-
report.pdf. [online: accessed 22 January 2018]

[3] IntelMQ, from https://github.com/certtools/intelmq [online: accessed 17 October


2017]

[4] A. Ramachandran and N. Feamster. Understanding the network-level behavior of


spammers. SIGCOMM Comput. Commun. Rev., 36(4):291–302, Aug. 2006 from
https://www.cc.gatech.edu/classes/AY2007/cs7260_spring/papers/p396-
ramachandran.pdf [online: accessed 21 February 2018]
[5] Vaclav Bartos and Martin Zadnik. An Analysis of Correlations of Intrusion Alerts
in a NREN from
http://www.fit.vutbr.cz/research/pubs/conpa.php.cs?file=%2Fpub%2F10526%2Fcama
d14_alert_correlations.pdf&id=10526 [online: accessed 21 February 2018]

[6] Anders Flaglien, Katrin Franke and Andre Arnes. Identifying Malware using
Cross-Evidence Correlation from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.666.4180&rep=rep1&type=
pdf [online: accessed 5 March 2018]

[7] MatplotLib, from https://matplotlib.org/ [online: accessed 14 March 2018]

[8] Pandas, from https://pandas.pydata.org [online: accessed 23 April 2018]

91
[9] European Union Agency for Network and Information Security, from
https://www.enisa.europa.eu/topics/csirt-cert-services/community-projects/existing-
taxonomies [online: accessed 27 April 2018]
[10] Raywood, Dan (April 24, 2015). "HP partner with AlienVault on Cyber Threat-
Sharing Initiative". ITPortal.com. Retrieved November 8, 2015 from
https://www.itproportal.com/2015/04/22/hp-partner-alienvault-cyber-threat-sharing-
initiative/ [online: accessed 24 April 2018]
[11] FireEye. What is Cybersecurity? Protecting your cyber assets and critical data
from https://www.fireeye.com/current-threats/what-is-cyber-security.html [online:
accessed 28 April 2018]
[12] FireEye. Threat Intelligence: against cyber threats, knowledge is power from
https://www.fireeye.com/solutions/cyber-threat-intelligence.html [online: accessed 29
April 2018]
[13] Gary Hayslip. Cyber Threat Intelligence [CTI] from
https://www.csoonline.com/article/3234714/data-protection/cyber-threat-intelligence-
cti-part-1.html [online: accessed 1 May 2018]
[14] Cyberpunk. Automate Incident Handling Process: IntelMQ from
https://n0where.net/automate-incident-handling-process-intelmq [online: accessed 3
May 2018]

92
11 APPENDIX
11.1 CONFIGURATION FILES OF INTELMQ:
11.1.1 Runtime.conf
{
"abusech-domain-parser": {
"description": "Abuse.ch Domain Parser is the bot responsible to parse the
report and sanitize the information.",
"group": "Parser",
"module": "intelmq.bots.parsers.abusech.parser_domain",
"name": "Abuse.ch Domain",
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},
"cymru-whois-expert": {
"description": "Cymry Whois (IP to ASN) is the bot responsible to add network
information to the events (BGP, ASN, AS Name, Country, etc..).",
"group": "Expert",
"module": "intelmq.bots.experts.cymru_whois.expert",
"name": "Cymru Whois",
"parameters": {
"redis_cache_db": 5,
"redis_cache_host": "127.0.0.1",
"redis_cache_password": null,
"redis_cache_port": 6379,
"redis_cache_ttl": 86400
},
"enabled": true,
"run_mode": "continuous"
},
"deduplicator-expert": {
"description": "Deduplicator is the bot responsible for detection and removal
of duplicate messages. Messages get cached for <redis_cache_ttl> seconds. If found in
the cache, it is assumed to be a duplicate.",
"group": "Expert",
"module": "intelmq.bots.experts.deduplicator.expert",
"name": "Deduplicator",
"parameters": {
"filter_keys": "raw,time.observation",
"filter_type": "blacklist",
"redis_cache_db": 6,
"redis_cache_host": "127.0.0.1",
"redis_cache_password": null,

93
"redis_cache_port": 6379,
"redis_cache_ttl": 86400
},
"enabled": true,
"run_mode": "continuous"
},
"file-output": {
"description": "File is the bot responsible to send events to a file.",
"group": "Output",
"module": "intelmq.bots.outputs.file.output",
"name": "File",
"parameters": {
"file": "/opt/intelmq/var/lib/bots/file-output/events.txt",
"hierarchical_output": false
},
"enabled": true,
"run_mode": "continuous"
},
"malc0de-parser": {
"description": "Malc0de Parser is the bot responsible to parse the IP Blacklist
and either Windows Format or Bind Format reports and sanitize the information.",
"group": "Parser",
"module": "intelmq.bots.parsers.malc0de.parser",
"name": "Malc0de",
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},
"malc0de-windows-format-collector": {
"description": "",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"name": "Malc0de Windows Format",
"parameters": {
"feed": "Generic URL Fetcher is the bot responsible to get the report from
an URL.",
"http_password": null,
"http_url": "https://malc0de.com/bl/BOOT",
"http_username": null,
"provider": "Malc0de",
"rate_limit": 10800,
"ssl_client_certificate": null
},
"enabled": true,
"run_mode": "continuous"
},
"malware-domain-list-collector": {

94
"parameters": {
"feed": "Malware Domain List",
"http_url": "http://www.malwaredomainlist.com/mdlcsv.php",
"provider": "Malware Domain List",
"rate_limit": 3600
},
"description": "Malware Domain List Collector is the bot responsible to get the
report from source of information.",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"name": "Malware Domain List",
"enabled": true,
"run_mode": "continuous"
},
"malware-domain-list-parser": {
"description": "Malware Domain List Parser is the bot responsible to parse the
report and sanitize the information.",
"group": "Parser",
"module": "intelmq.bots.parsers.malwaredomainlist.parser",
"name": "Malware Domain List",
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},
"spamhaus-drop-collector": {
"description": "",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"name": "Spamhaus Drop",
"parameters": {
"feed": "Spamhaus Drop",
"http_password": null,
"http_url": "https://www.spamhaus.org/drop/drop.txt",
"http_username": null,
"provider": "Spamhaus",
"rate_limit": 3600,
"ssl_client_certificate": null
},
"enabled": true,
"run_mode": "continuous"
},
"spamhaus-drop-parser": {
"description": "Spamhaus Drop Parser is the bot responsible to parse the DROP,
EDROP, DROPv6, and ASN-DROP reports and sanitize the information.",
"group": "Parser",
"module": "intelmq.bots.parsers.spamhaus.parser_drop",
"name": "Spamhaus Drop",

95
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},
"taxonomy-expert": {
"description": "Taxonomy is the bot responsible to apply the eCSIRT Taxonomy to
all events.",
"group": "Expert",
"module": "intelmq.bots.experts.taxonomy.expert",
"name": "Taxonomy",
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},
"abusech-feodo-ip-collector": {
"parameters": {
"feed": "Abuse.ch Feodo IP",
"provider": "Abuse.ch",
"http_url":
"https://feodotracker.abuse.ch/blocklist/?download=ipblocklist",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Abuse.ch Feodo IP",
"enabled": true,
"run_mode": "continuous"
},
"Abusech-IP-Parser": {
"parameters": {},
"name": "Abuse.ch IP",
"group": "Parser",
"module": "intelmq.bots.parsers.abusech.parser_ip",
"description": "Abuse.ch IP Parser is the bot responsible to parse the report
and sanitize the information.",
"enabled": true,
"run_mode": "continuous"
},
"abusech-zeus-domainblocklist-collector": {
"parameters": {
"feed": "Abuse.ch Zeus Domain Block List",
"provider": "Abuse.ch",

96
"http_url":
"https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Zeus Tracker",
"enabled": true,
"run_mode": "continuous"
},
"abusech-zeus-baddomains-collector": {
"parameters": {
"feed": "Abuse.ch Zeus Bad Domains",
"provider": "Abuse.ch",
"http_url":
"https://zeustracker.abuse.ch/blocklist.php?download=baddomains",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Generic URL Fetcher is the bot responsible to get the report
from an URL.",
"enabled": true,
"run_mode": "continuous"
},
"PhishTank-Parser": {
"parameters": {},
"name": "PhishTank",
"group": "Parser",
"module": "intelmq.bots.parsers.phishtank.parser",
"description": "PhishTank Parser is the bot responsible to parse the report and
sanitize the information.",
"enabled": true,
"run_mode": "continuous"
},
"phishtank-collector": {
"parameters": {

97
"feed": "Phishtank csv",
"provider": "Phishtank ",
"http_url": "http://data.phishtank.com/data/online-valid.csv",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Generic URL Fetcher is the bot responsible to get the report
from an URL.",
"enabled": true,
"run_mode": "continuous"
},
"url2fqdn-expert": {
"parameters": {
"overwrite": false
},
"name": "url2fqdn",
"group": "Expert",
"module": "intelmq.bots.experts.url2fqdn.expert",
"description": "url2fqdn is the bot responsible to parsing the fqdn from the
url.",
"enabled": true,
"run_mode": "continuous"
},
"gethostbyname-1-expert": {
"parameters": {},
"name": "Gethostbyname",
"group": "Expert",
"module": "intelmq.bots.experts.gethostbyname.expert",
"description": "fqdn2ip is the bot responsible to parsing the ip from the
fqdn.",
"enabled": true,
"run_mode": "continuous"
},
"gethostbyname-2-expert": {
"parameters": {},
"name": "Gethostbyname",
"group": "Expert",
"module": "intelmq.bots.experts.gethostbyname.expert",
"description": "fqdn2ip is the bot responsible to parsing the ip from the
fqdn.",
"enabled": true,

98
"run_mode": "continuous"
}
}

11.1.2 Pipeline.conf
{
"Abusech-IP-Parser": {
"source-queue": "Abusech-IP-Parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
"PhishTank-Parser": {
"source-queue": "PhishTank-Parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
"abusech-domain-parser": {
"source-queue": "abusech-domain-parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
"abusech-feodo-ip-collector": {
"destination-queues": [
"Abusech-IP-Parser-queue"
]
},
"abusech-zeus-baddomains-collector": {
"destination-queues": [
"abusech-domain-parser-queue"
]
},
"abusech-zeus-domainblocklist-collector": {
"destination-queues": [
"abusech-domain-parser-queue"
]
},
"cymru-whois-expert": {
"source-queue": "cymru-whois-expert-queue",
"destination-queues": [
"file-output-queue"
]
},
"deduplicator-expert": {
"source-queue": "deduplicator-expert-queue",
"destination-queues": [
"taxonomy-expert-queue"
]

99
},
"file-output": {
"source-queue": "file-output-queue"
},
"gethostbyname-1-expert": {
"source-queue": "gethostbyname-1-expert-queue",
"destination-queues": [
"cymru-whois-expert-queue"
]
},
"gethostbyname-2-expert": {
"source-queue": "gethostbyname-2-expert-queue",
"destination-queues": [
"cymru-whois-expert-queue"
]
},
"malc0de-parser": {
"source-queue": "malc0de-parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
"malc0de-windows-format-collector": {
"destination-queues": [
"malc0de-parser-queue"
]
},
"malware-domain-list-collector": {
"destination-queues": [
"malware-domain-list-parser-queue"
]
},
"malware-domain-list-parser": {
"source-queue": "malware-domain-list-parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
"phishtank-collector": {
"destination-queues": [
"PhishTank-Parser-queue"
]
},
"spamhaus-drop-collector": {
"destination-queues": [
"spamhaus-drop-parser-queue"
]
},
"spamhaus-drop-parser": {
"source-queue": "spamhaus-drop-parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},

100
"taxonomy-expert": {
"source-queue": "taxonomy-expert-queue",
"destination-queues": [
"url2fqdn-expert-queue"
]
},
"url2fqdn-expert": {
"source-queue": "url2fqdn-expert-queue",
"destination-queues": [
"gethostbyname-1-expert-queue",
"gethostbyname-2-expert-queue"
]
}
}

11.2 GENERATEMAP.PY
"""
Author: "Javier Gombao Fernandez-Calvillo"
eMail: a00248414@student.ait.ie
College: Athlone Institute of Technology (AIT)
Subject: Final Project
File: generateMap.py

The purpose of this script is to obtain all the names of the countries which
IntelMQ has detected as a cyber threat
Variables (lists):
nameCountries: it provides the name if the country. Examples: Ireland, Spain,
England, France...
values: it defines the number of times which the country apperars in the result
file given by IntelMQ
latitude: it defines the latitude value
longitude: it defines the longitude value
"""

# Import modules
import folium
import pandas as pd
import matplotlib.pyplot as plt

# Variables which we are going to use in the program


FILEOUTPUT = "output.txt"
countries = []
nameCountries =[]
values = []
latitude = []
longitude = []

# Set ipython's max row display


pd.set_option('display.max_row', 245)

"""This is a struct which defines the code of the country and its ocurrences"""
class Country():
def __init__(self, country, count):
self.country = country
self.count = count

101
self.latitude = latitude
self.longitude = longitude

class Map(object):

"""Constructor of the function"""


def __init__(self, fileEvents, fileCodeISOCountries):
self._fileEvents=fileEvents
self._fileCodeISOCountries=fileCodeISOCountries

"""It returns the name of the fileEvents"""


def getFileEvents(self):
return self._fileEvents

"""It gets the file with the name of the countries and their longitude and
latitude (geolocation)"""
def getFileCodeISOCountries(self):
return self._fileCodeISOCountries

"""It eliminates duplicates from the list of the countries"""


def obtainDistinctCountries(self, listCounties):
distinctCountries = []
times = 0
for i in listCounties:
if i not in distinctCountries:
distinctCountries.append(i)
return distinctCountries

"""This function will count the countries from the list"""


def countCountries(self, country, listCounties):
times = 0
for i in listCounties:
if country == i:
times += 1
return times

"""This function will print the name of the countries and its ocurrence"""
def printCountriesCount(self, countriesCountList):
for i in countriesCountList:
print(i.country, i.count)

"""This function will count the ocurrences that the program read from the file
events.txt"""
def getOcurrencesCountry(self):
search_name = "source.geolocation.cc"
country_list = []
try:
with open(self.getFileEvents()) as attacks:
for attack in attacks:
# We comprobate that there is a geolocation available in the line
if search_name in attack:
attributes = attack.split(", ")
i = 0
while i < len(attributes):
g = attributes[i].split(": ")
#We are looking for the attribute "source.location.cc in
each line:
for a in g:
if "\"source.geolocation.cc\"" == a:
"""We obtain the code of the country: """
temp = len(g[1])

102
s1 = g[1][:temp - 1]
s2 = s1[1:]
"""print(s2)"""
country_list.append(s2)
i += 1

distinct_countries = self.obtainDistinctCountries(country_list)

#We insert the countries and its ocurrence in the class 'Struct'
for c in distinct_countries:
countries.append(Country(c, self.countCountries(c, country_list)))

except Exception:
print("Error: File not found.")

"""This function makes a data frame with points to show on the map"""
def loadData(self):
data = pd.DataFrame({
'name':nameCountries,
'nº attacks':values,
'lat':latitude,
'lon':longitude
})
data
return data

"""It generates the map itself with the bubblets around it"""
def createMap(self):
# Sort the dataframe’s rows by reports, in descending order:
data = self.loadData().sort_values(by='nº attacks', ascending=0)

#In the file output we can see the country, the ocurrences and the values of
latitude and longitude
file = open(FILEOUTPUT,"w")
file.write(str(data))

print("Loading the map...")

# It does an empty map


m = folium.Map(location=[0, 0], tiles="Mapbox Bright", zoom_start=2)

# We will add markers on the map


for i in range(0, len(data)):
folium.Circle(
location=[data.iloc[i]['lon'], data.iloc[i]['lat']],
popup=data.iloc[i]['name'],
radius=data.iloc[i]['nº attacks'] * 50.5,
color='crimson',
fill=True,
fill_color='crimson'
).add_to(m)

# Save it as html
m.save('mymap.html')

"""It obtains the longitude and latitude using the file iso3166-1-alpha-2.txt"""
def obtainLongitudeLatitude(self):
try:
with open(self.getFileCodeISOCountries()) as lines:
for line in lines:
attributes = line.split(",")

103
temp = len(attributes[0])
s1 = attributes[0][:temp - 1]
code = s1[1:]
lon = attributes[1]
lat = attributes[2]
temp_aux = len(attributes[3])
a = attributes[3][:temp_aux - 2]
name = a[1:]

# We insert the data on the lists


i = 0
while i < len(countries):
if code == countries[i].country:
nameCountries.append(name)
values.append(int(countries[i].count))
longitude.append(float(lon))
latitude.append(float(lat))
i += 1

except Exception:
print("Error: File not found.")

s = Map("events.txt", "iso3166-1-alpha-2.txt")
s.getOcurrencesCountry()
s.obtainLongitudeLatitude()
s.createMap()

11.3 ISO3166-1-ALPHA-2.TXT
'AF',33.93911,67.709953,'Afghanistan'
'AX',37.0625,-95.677068,'Ãland Islands'
'AL',41.153332,20.168331,'Albania'
'DZ',28.033886,1.659626,'Algeria'
'AS',-14.270972,-170.132217,'American Samoa'
'AD',42.546245,1.601554,'Andorra'
'AO',-11.202692,17.873887,'Angola'
'AI',18.220554,-63.068615,'Anguilla'
'AQ',-75.250973,-0.071389,'Antarctica'
'AG',17.060816,-61.796428,'Antigua and Barbuda'
'AR',-38.416097,-63.616672,'Argentina'
'AM',40.069099,45.038189,'Armenia'
'AW',12.52111,-69.968338,'Aruba'
'AU',-25.274398,133.775136,'Australia'
'AT',47.516231,14.550072,'Austria'
'AZ',40.143105,47.576927,'Azerbaijan'
'BS',25.03428,-77.39628,'Bahamas'
'BH',25.930414,50.637772,'Bahrain'
'BD',23.684994,90.356331,'Bangladesh'
'BB',13.193887,-59.543198,'Barbados'
'BY',53.709807,27.953389,'Belarus'
'BE',50.503887,4.469936,'Belgium'
'BZ',17.189877,-88.49765,'Belize'
'BJ',9.30769,2.315834,'Benin'

104
'BM',32.321384,-64.75737,'Bermuda'
'BT',27.514162,90.433601,'Bhutan'
'BO',-16.290154,-63.588653,'Bolivia'
'BA',43.915886,17.679076,'Bosnia and Herzegovina'
'BW',-22.328474,24.684866,'Botswana'
'BV',-54.423199,3.413194,'Bouvet Island'
'BR',-14.235004,-51.92528,'Brazil'
'IO',-6.343194,71.876519,'British Indian Ocean Territory'
'BN',4.535277,114.727669,'Brunei Darussalam'
'BG',42.733883,25.48583,'Bulgaria'
'BF',12.238333,-1.561593,'Burkina Faso'
'BI',-3.373056,29.918886,'Burundi'
'KH',12.565679,104.990963,'Cambodia'
'CM',7.369722,12.354722,'Cameroon'
'CA',56.130366,-106.346771,'Canada'
'CV',16.002082,-24.013197,'Cape Verde'
'KY',19.513469,-80.566956,'Cayman Islands'
'CF',6.611111,20.939444,'Central African Republic'
'TD',15.454166,18.732207,'Chad'
'CL',-35.675147,-71.542969,'Chile'
'CN',35.86166,104.195397,'China'
'CX',-10.447525,105.690449,'Christmas Island'
'CC',37.0625,-95.677068,'Cocos (Keeling) Islands'
'CO',4.570868,-74.297333,'Colombia'
'KM',-11.875001,43.872219,'Comoros'
'CG',-0.228021,15.827659,'Congo'
'CD',-0.228021,15.827659,'The Democratic Republic of Congo '
'CK',-21.236736,-159.777671,'Cook Islands'
'CR',9.748917,-83.753428,'Costa Rica'
'CI',7.539989,-5.54708,'C dIvoire'
'HR',45.1,15.2,'Croatia'
'CU',21.521757,-77.781167,'Cuba'
'CY',35.126413,33.429859,'Cyprus'
'CZ',49.817492,15.472962,'Czech Republic'
'DK',56.26392,9.501785,'Denmark'
'DJ',11.825138,42.590275,'Djibouti'
'DM',15.414999,-61.370976,'Dominica'
'DO',18.735693,-70.162651,'Dominican Republic'
'EC',-1.831239,-78.183406,'Ecuador'
'EG',26.820553,30.802498,'Egypt'
'SV',13.794185,-88.89653,'El Salvador'
'GQ',1.650801,10.267895,'Equatorial Guinea'
'ER',15.179384,39.782334,'Eritrea'
'EE',58.595272,25.013607,'Estonia'
'ET',9.145,40.489673,'Ethiopia'
'FK',-51.796253,-59.523613,'Falkland Islands (Malvinas)'
'FO',61.892635,-6.911806,'Faroe Islands'
'FJ',-16.578193,179.414413,'Fiji'
'FI',61.92411,25.748151,'Finland'
'FR',46.227638,2.213749,'France'
'GF',3.933889,-53.125782,'French Guiana'

105
'PF',-17.679742,-149.406843,'French Polynesia'
'TF',37.0625,-95.677068,'French Southern Territories'
'GA',-0.803689,11.609444,'Gabon'
'GM',13.443182,-15.310139,'Gambia'
'GE',32.157435,-82.907123,'Georgia'
'DE',51.165691,10.451526,'Germany'
'GH',7.946527,-1.023194,'Ghana'
'GI',36.137741,-5.345374,'Gibraltar'
'GR',39.074208,21.824312,'Greece'
'GL',71.706936,-42.604303,'Greenland'
'GD',12.262776,-61.604171,'Grenada'
'GP',16.995971,-62.067641,'Guadeloupe'
'GU',13.444304,144.793731,'Guam'
'GT',15.783471,-90.230759,'Guatemala'
'GG',49.465691,-2.585278,'Guernsey'
'GN',9.945587,-9.696645,'Guinea'
'GW',11.803749,-15.180413,'Guinea-Bissau'
'GY',4.860416,-58.93018,'Guyana'
'HT',18.971187,-72.285215,'Haiti'
'HM',-53.08181,73.504158,'Heard Island and McDonald Islands'
'VA',37.0625,-95.677068,'Holy See (Vatican City State)'
'HN',15.199999,-86.241905,'Honduras'
'HK',22.396428,114.109497,'Hong Kong'
'HU',47.162494,19.503304,'Hungary'
'IS',64.963051,-19.020835,'Iceland'
'IN',20.593684,78.96288,'India'
'ID',-0.789275,113.921327,'Indonesia'
'IR',32.427908,53.688046,'Iran'
'IQ',33.223191,43.679291,'Iraq'
'IE',53.41291,-8.24389,'Ireland'
'IM',54.236107,-4.548056,'Isle of Man'
'IL',31.046051,34.851612,'Israel'
'IT',41.87194,12.56738,'Italy'
'JM',18.109581,-77.297508,'Jamaica'
'JP',36.204824,138.252924,'Japan'
'JE',49.214439,-2.13125,'Jersey'
'JO',30.585164,36.238414,'Jordan'
'KZ',48.019573,66.923684,'Kazakhstan'
'KE',-0.023559,37.906193,'Kenya'
'KI',-3.370417,-168.734039,'Kiribati'
'KP',35.907757,127.766922,'Democratic People Republic of Korea'
'KR',35.907757,127.766922,'Republic of Korea'
'KW',29.31166,47.481766,'Kuwait'
'KG',41.20438,74.766098,'Kyrgyzstan'
'LA',19.85627,102.495496,'Lao People Democratic Republic'
'LV',56.879635,24.603189,'Latvia'
'LB',33.854721,35.862285,'Lebanon'
'LS',-29.609988,28.233608,'Lesotho'
'LR',6.428055,-9.429499,'Liberia'
'LY',37.0625,-95.677068,'Libyan Arab Jamahiriya'
'LI',47.166,9.555373,'Liechtenstein'

106
'LT',55.169438,23.881275,'Lithuania'
'LU',49.815273,6.129583,'Luxembourg'
'MO',22.198745,113.543873,'Macao'
'MK',41.608635,21.745275,'Macedonia'
'MG',-18.766947,46.869107,'Madagascar'
'MW',-13.254308,34.301525,'Malawi'
'MY',4.210484,101.975766,'Malaysia'
'MV',3.202778,73.22068,'Maldives'
'ML',17.570692,-3.996166,'Mali'
'MT',35.937496,14.375416,'Malta'
'MH',7.131474,171.184478,'Marshall Islands'
'MQ',14.641528,-61.024174,'Martinique'
'MR',21.00789,-10.940835,'Mauritania'
'MU',-20.348404,57.552152,'Mauritius'
'YT',-12.8275,45.166244,'Mayotte'
'MX',23.634501,-102.552784,'Mexico'
'FM',7.425554,150.550812,'Micronesia'
'MD',47.411631,28.369885,'Moldova, Republic of'
'MC',43.750298,7.412841,'Monaco'
'MN',46.862496,103.846656,'Mongolia'
'ME',42.708678,19.37439,'Montenegro'
'MS',16.742498,-62.187366,'Montserrat'
'MA',31.791702,-7.09262,'Morocco'
'MZ',-18.665695,35.529562,'Mozambique'
'MM',21.913965,95.956223,'Myanmar'
'NA',-22.95764,18.49041,'Namibia'
'NR',-0.522778,166.931503,'Nauru'
'NP',28.394857,84.124008,'Nepal'
'NL',52.132633,5.291266,'Netherlands'
'AN',12.226079,-69.060087,'Netherlands Antilles'
'NC',-20.904305,165.618042,'New Caledonia'
'NZ',-40.900557,174.885971,'New Zealand'
'NI',12.865416,-85.207229,'Nicaragua'
'NE',17.607789,8.081666,'Niger'
'NG',9.081999,8.675277,'Nigeria'
'NU',-19.054445,-169.867233,'Niue'
'NF',-29.040835,167.954712,'Norfolk Island'
'MP',17.33083,145.38469,'Northern Mariana Islands'
'NO',60.472024,8.468946,'Norway'
'OM',21.512583,55.923255,'Oman'
'PK',30.375321,69.345116,'Pakistan'
'PW',7.51498,134.58252,'Palau'
'PS',42.094445,17.266614,'Palestinian Territory'
'PA',8.537981,-80.782127,'Panama'
'PG',-6.314993,143.95555,'Papua New Guinea'
'PY',-23.442503,-58.443832,'Paraguay'
'PE',-9.189967,-75.015152,'Peru'
'PH',12.879721,121.774017,'Philippines'
'PN',-24.703615,-127.439308,'Pitcairn'
'PL',51.919438,19.145136,'Poland'
'PT',39.399872,-8.224454,'Portugal'

107
'PR',18.220833,-66.590149,'Puerto Rico'
'QA',25.354826,51.183884,'Qatar'
'RE',-21.115141,55.536384,'Réunion'
'RO',45.943161,24.96676,'Romania'
'RU',61.52401,105.318756,'Russian Federation'
'RW',-1.940278,29.873888,'Rwanda'
'BL',37.0625,-95.677068,'Saint Bartholemy'
'SH',-24.143474,-10.030696,'Saint Helena, Ascension and Tristan da Cunha'
'KN',17.357822,-62.782998,'Saint Kitts and Nevis'
'LC',13.909444,-60.978893,'Saint Lucia'
'MF',43.589046,5.885031,'Saint Martin (French part)'
'PM',46.941936,-56.27111,'Saint Pierre and Miquelon'
'VC',12.984305,-61.287228,'Saint Vincent and the Grenadines'
'WS',-13.759029,-172.104629,'Samoa'
'SM',43.94236,12.457777,'San Marino'
'ST',0.18636,6.613081,'Sao Tome and Principe'
'SA',23.885942,45.079162,'Saudi Arabia'
'SN',14.497401,-14.452362,'Senegal'
'RS',44.016521,21.005859,'Serbia'
'SC',-4.679574,55.491977,'Seychelles'
'SL',8.460555,-11.779889,'Sierra Leone'
'SG',1.352083,103.819836,'Singapore'
'SK',48.669026,19.699024,'Slovakia'
'SI',46.151241,14.995463,'Slovenia'
'SB',-9.64571,160.156194,'Solomon Islands'
'SO',5.152149,46.199616,'Somalia'
'ZA',-30.559482,22.937506,'South Africa'
'GS',-54.429579,-36.587909,'South Georgia and the South Sandwich Islands'
'ES',40.463667,-3.74922,'Spain'
'LK',7.873054,80.771797,'Sri Lanka'
'SD',12.862807,30.217636,'Sudan'
'SR',3.919305,-56.027783,'Suriname'
'SJ',77.553604,23.670272,'Svalbard and Jan Mayen'
'SZ',-26.522503,31.465866,'Swaziland'
'SE',60.128161,18.643501,'Sweden'
'CH',46.818188,8.227512,'Switzerland'
'SY',34.802075,38.996815,'Syrian Arab Republic'
'TW',23.69781,120.960515,'Taiwan'
'TJ',38.861034,71.276093,'Tajikistan'
'TZ',-6.369028,34.888822,'Tanzania, United Republic of'
'TH',15.870032,100.992541,'Thailand'
'TL',-8.874217,125.727539,'Timor-Leste'
'TG',8.619543,0.824782,'Togo'
'TK',-8.967363,-171.855881,'Tokelau'
'TO',-21.178986,-175.198242,'Tonga'
'TT',10.691803,-61.222503,'Trinidad and Tobago'
'TN',33.886917,9.537499,'Tunisia'
'TR',38.963745,35.243322,'Turkey'
'TM',38.969719,59.556278,'Turkmenistan'
'TC',21.694025,-71.797928,'Turks and Caicos Islands'
'TV',-7.109535,177.64933,'Tuvalu'

108
'UG',1.373333,32.290275,'Uganda'
'UA',48.379433,31.16558,'Ukraine'
'AE',23.424076,53.847818,'United Arab Emirates'
'GB',55.378051,-3.435973,'United Kingdom'
'US',37.09024,-95.712891,'United States'
'UM',24.747346,-167.594906,'United States Minor Outlying Islands'
'UY',-32.522779,-55.765835,'Uruguay'
'UZ',41.377491,64.585262,'Uzbekistan'
'VU',-15.376706,166.959158,'Vanuatu'
'VE',6.42375,-66.58973,'Venezuela'
'VN',14.058324,108.277199,'VietNam'
'VI',18.335765,-64.896335,'Virgin Islands'
'WF',-13.768752,-177.156097,'Wallis and Futuna'
'EH',24.215527,-12.885834,'Western Sahara'
'YE',15.552727,48.516388,'Yemen'
'ZM',-13.133897,27.849332,'Zambia'
'ZW',-19.015438,29.154857,'Zimbabwe'

11.4 GENERATECHART.PY
"""
Author: "Javier Gombao Fernandez-Calvillo"
eMail: a00248414@student.ait.ie
College: Athlone Institute of Technology (AIT)
Subject: Final Project
File: generateMap.py

The purpose of this code is to obtain graphics in order to represents the data of
IntelMQ:
nameCountries: it provides the name if the country. Examples: Ireland, Spain,
England, France...
values: it defines the number of times which the country apperars in the result
file given by IntelMQ
latitude: it defines the latitude value
longitude: it defines the longitude value
"""

# Import modules
import sys
import matplotlib.pyplot as plt
import pandas as pd

# Variables which we are going to use in the program


classificationTaxonomyCount = []
FILEOUTPUT = "output_classification_malware.txt"

# We configure the size of the table to show in the fileoutput


pd.set_option('display.max_row', 900)

class Struct():
def __init__(self, taxonomy, count):
self.taxonomy = taxonomy
self.count = count

def getTaxonomy(self):

109
return self.taxonomy

def getCount(self):
return self.count

class generateChart(object):
"""Constructor of the function"""

def __init__(self, fileEvents):


self._fileEvents = fileEvents

"""It returns the name of the fileEvents"""


def getFileEvents(self):
return self._fileEvents

"""It eliminates duplicates from the list of the taxonomy"""


def obtainDistinctTaxonomy(self, classificationTaxonomyList):
distinctTaxonomy = []
for i in classificationTaxonomyList:
if i not in distinctTaxonomy:
distinctTaxonomy.append(i)
return distinctTaxonomy

"""This function will count the taxonomy from the list"""


def countTaxonomy(self, taxonomy, classificationTaxonomyList):
times = 0
for i in classificationTaxonomyList:
if taxonomy == i:
times += 1
return times

"""This function will print the name of the countries and its ocurrence"""
def printTaxonomy(self, classificationTaxonomyList):
for i in classificationTaxonomyList:
print(i.taxonomy, i.count)

"""This function will count the ocurrences that the program read from the file
events.txt"""

def getOcurrences(self, search_name):


taxonomy_list = []
try:
with open(self.getFileEvents()) as attacks:
for attack in attacks:
# We comprobate that there is a taxonomy available in the line
if search_name in attack:
attributes = attack.split(", ")
i = 0
while i < len(attributes):
g = attributes[i].split(": ")
# We are looking for the attribute "source.location.cc in
each line:
for a in g:
if "\""+search_name+"\"" == a:
"""We obtain the code of the country: """
temp = len(g[1])
s1 = g[1][:temp - 1]
s2 = s1[1:]
if "malicious code" in s2:
taxonomy_list.append("malicious code")

110
elif "fraud" in s2:
taxonomy_list.append("fraud")
elif "abusive content" in s2:
taxonomy_list.append("abusive content")
else:
taxonomy_list.append(s2)
i += 1
distinct_taxonomy = self.obtainDistinctTaxonomy(taxonomy_list)
# We insert the countries and its ocurrence in the class 'Struct'

for c in distinct_taxonomy:
classificationTaxonomyCount.append(Struct(c, self.countTaxonomy(c,
taxonomy_list)))

if not classificationTaxonomyCount:
print("No search matches")
else:
file = open(FILEOUTPUT, "w")
data = self.loadData().sort_values(by='Count', ascending=0)
print(str(data))
file.write(str(data))

except Exception:
print("Error: File not found.")

def loadData(self):
taxonomy = []
count = []
for i in classificationTaxonomyCount:
taxonomy.append(i.taxonomy)
count.append(i.count)
data = pd.DataFrame({
'Taxonomy': taxonomy,
'Count': count
})
return data

"""It generates the graph itself with the bubblets around it"""
def createChart(self):
if classificationTaxonomyCount:
x = []
labels = []
for i in classificationTaxonomyCount:
x.append(i.count)
labels.append(i.taxonomy)
plt.pie(x, labels=labels, autopct='%1.1f%%')
plt.axis('equal')
plt.show()

s = generateChart("events.txt")
s.getOcurrences(str(sys.argv[1]))
s.createChart()

111
Institiúid Teicneolaíochta Bhaile Átha Luain

Athlone Institute of Technology,

Ireland, Éire

Athlone

May 18

112
113