Вы находитесь на странице: 1из 12

How to Become a Malware Analyst

Bit Feb 2 · 14 min read

Hi everyone, my name is bit. And I was always fascinated about computer security.
After a lot of researching, and searching, I fell in love with malware analysis. I
thought to myself, maybe these findings may help someone else as well. To be
honest, writing this essay was in the back of my mind for two years, but I never
had the courage to do it. Because I thought to myself that what I have is not
significant enough, or it may not help anyone at all. But I finally overcome it, and
started writing about it. I hope that the roadmap, and resources that I’m sharing
with you here, will help you out in your journey of becoming a better malware
analyst.

So you have decided that you want to become a malware analyst, or even consider
it. But I’m sorry to inform you that it’s not going to be easy. I mean, if you think
about it reaching every goal is hard. It needs dedication, and hardship. So in order
to have a better chance of reaching your goal. I recommend reading
Ms. Azeria mini series. She will really help you to have a better understanding of
how to set your mind, in order to reach your goal. Even if you don’t intend to
become a malware analyst, and have other goals in mind. This mini series will
really help you immensely:

1- The Importance of Deep Work

2- The Paradox of Choice

3- The Process of Mastering a Skill

After figuring out your goal, and getting into the right mindset, you need to have
good foundations. Why? because you must have enough background knowledge
about analyzing malware that you won’t get stuck, or confused, when you read a
book about malware analysis, or analyzing a particular malware. It’s like trying to
become the best runner, but you don’t have the endurance, or physiques of
running. I’m not going into details of why this is so important, but
Mr. Lost explains it very well:

DEF CON 23 1057 RICK ASTLEY

Now shall we play a game? :)

After some thought, consideration, and my own experience about the resources
that I’ve gathered. I reorganized them into two sections: malware 101, and
malware 102. Maybe there are more foundations, but this is all I’ve found.
Malware 101 is the necessity for analyzing malware, but if you are in a hurry, you
can skip 102. Just remember to go back to 102 when you have the time. If 101 is
the foundation, then 102 is like fortifying your foundation. You don’t need 102
right now, but you will eventually.

Malware 101
- Virtualization
You should never analyze malware on your own system, because you’re going to
take the risk of damaging your computer. Instead, you need an isolation
environment. VMware Workstation Pro ($$$) or VirtualBox (free) software will
provide this feature. VMware Workstation Pro has two options that VirtualBox
doesn’t have: Playback feature, and taking multiple snapshot branches. VirtualBox
doesn’t have playback feature at all, and only supports one snapshot branch. But
for anyone who is starting in this field is a decent option. These software give you
the ability to install an operating system with the help of ISO file (operating system
image) inside your own OS.

So install one of them and follow up a tutorial online to booting up an OS (guest


OS) within your own OS (host OS). This is a little guide by how-To geek website
on how to download Windows ISO legally. It’s probably best to search online
yourself For Linux ISOs, but for starters try to install Ubuntu on your VirtualBox,
or VMware. If you think that your system can’t handle Ubuntu, then try Xubuntu.
It is a derivation of Ubuntu family, but uses less resources than other Linux
distributions.

One thing to take into consideration is that these virtualization software use your
PC or Laptop resources (RAM, CPU, Storage device, and etc) to boot up your
virtualized (guest) OS. So you need to have a decent PC to be able to virtualize
these OSs. SANS institution Reverse-Engineering Malware course gives a
detailed requirement for what kind of system you should have in “Laptop
Required” section.

-C
First of all you need to learn C programming language, because the malwares that
you are going to analyze use the concept of pointer a lot through out their source
code, and by being familiar with pointers your job of analyzing malware will be a
lot more easier. Malware also uses C library functions a lot, because as
Mr. Wosar mentions, “Large portions of all major operating systems these days are
still based on C. Therefore, a lot of API documentation is very C-centric. You will
have a way easier time reading documentation and manuals if you know C.” [1]. So
you will gain a lot by learning about C. If you know a high level language like Java,
or C#. You still need to learn C, because high level languages such as Java, C#,
Python, etc, hide some aspect of low level programming like pointers, and memory
allocation. And when you are analyzing malware, you need to deal with this part of
low level programming. Some people may already be familiar with C++, but I still
advise you to learn C, because the way C handles some programming concept is
completely different than C++.

The CS50 course (one of the top courses from Edx website) is the best source for
learning C language in my opinion, and the book that accompanies the course is
also the best: Programming in C by Stephen G. Kochan. That being said, CS50
is not an easy course. If you think that CS50 is not for you, and you only want to
become familiar with C. Then read the “Programming in C” book, do its exercises,
and only watch CS50 videos.

The most important aspect in learning the C programming language is pointers,


dealing with structures, memories, and familiarity with C functions, and how they
operate. You must master them, because you’re going to need them in “malware
analysis”.

- Python
Maybe you need to automate some simple stuff in a short amount of time, or
manipulate malware to act as you want it to, or develop an extension for one of
your malware analysis tools [2]. Python is your language of choice. There are other
scripting languages like Python too, but Python integration in other application
(like IDA pro) and “vast amount of libraries aimed explicitly at reverse
engineering” [1] makes it the better option.

Programming for everybody (part 1) and Python data structure (part 2)


from one of the top courses in Coursera website are good resources to learn about
Python programming.

- Assembly
After you have learned C language. You need to become familiar with Assembly
language, because you are going to spend a lot of your time dealing with Assembly,
when you’re reverse engineering a malware. So a good knowledge of Assembly
language, and its instruction is the key to your success. I really
recommend Assembly Language Step-by-Step by Jeff Duntemann. Mr.
Duntemann will really takes his time, but by doing so will make you fell in love
with Assembly language. You need to have a Linux system (maybe Xubuntu) and a
debugger that has a GUI like KDbg debugger, or a terminal-based debugger like
GDB (GNU debugger) with GEF . For building your code, you need nasm and ld,
which you have to type these commands on your system’s terminal:

in 64 bit Linux system:

$> nasm -f elf -g -F dwarf hello.asm


$> ld -m elf_i386 -o hello hello.o

in 32 bit Linxu system:

$> nasm -f elf -g -F dwarf hello.asm


$> ld -o hello hello.o

and for editing your code a simple text editor will suffice. Atom, Visual Studio
Code, Notepad++, Sublime Text, Vim and etc. If you’re not sure, then use Visual
Studio Code, and install an extension for highlighting x86 Assembly.

“I wanted to give you these handicaps, because the book is a little out-dated in
some aspect like debugging software, or how to compile your code to contain
debugging symbol, but in the matter of teaching you Assembly language it gets a
perfect score ”

If you are a little familiar with Assembly language, or don’t want a gentle
introduction, then open security training website has a course regarding x86
Assembly Language: Introductory Intel x86

- Computer Architecture
You need to know how CPU, and your computer system works, because they are
the building blocks of how OS operates on a system. Nand to Tetris part
1 course by Coursera website will teach you these concepts in a practical way. Also
having Structured Computer Organization book by Andrew S. Tanenbaum,
and Todd Austin as a reference will make your life a lot easier.

If you really want a more hands-on experience, then Building an 8-bit


Breadboard Computer by Ben Eater is for you, but if you can’t afford the parts,
then Nand to Tetris will do the job.

- Operating System
Malware uses the exact functions that OS uses to communicate with computer
system, so having a theory of how operating system does its job will help you to
understand in some way how malware operates, and interacts with the
system. Nand to Tetris part 2course will help you to understand these
fundamentals. Also Modern Operating Systems book by Andrew S.
Tanenbaum and Herbert Bos will really help you as a supplementary book.

If you are a more hands-on person, and you are familiar with C++, then Write
Your Own Operating System by Viktor Engelmann will teach you a lot about
OS and how they operate.

- Network
Since malware in some way, or form communicates with Internet, and you have to
deal with it, when you are analyzing a particular malware. Then Computer
Network knowledge is essential in your skill sets. Computer Networks book by
Andrew S. Tanenbaum will help you gain that knowledge.

Malware 101 Diagram


created by draw.io

- Malware 102
When I’ve first started to learn about malware analysis. I thought that by learning
Malware 101, and just deepening my knowledge more about C, and Microsoft
operating system, I’ve gained everything out there about malware analysis. But as
I’ve started to read more books, and did more research about the subject. I started
to realize that there is more. Practical Reverse Engineering, and Practical
Malware Analysis books were the major part of this realization, and shaped
Malware 102.

-C
Now that you have learned fundamentals of C language, then you need to go
deeper. Expert C Programming: Deep C Secrets book by Peter van der
Linden will provide that deepness.

- C++
C++ was created, because C wasn’t an Object Oriented Programming (OOP). That
being said some of the resources that I’m going to introduce here needs C++
background knowledge, so learning C++ language is required. Accelerated C++:
Practical Programming by Example book by Andrew Koenig is the best
choice since you are familiar with fundamentals of computer programming, and C
language. But if you think that the pace of the book is fast, or you need to start at
the very basic, then I recommend Learn C++ by code academy, or C++ Primer
Plus book by Stephen Prata.

Just a side note regarding C++: I’ve made a short introduction to C++, and sell
C++ really short of what it’s really worth, but it’s one of the top languages in the
Computer industry, and one of my favorites.

- Assembly
Practical Malware Analysis book states, “What if you encounter an instruction you
have never seen before? If you can’t find your answer with a Google search, you
can download the complete x86 architecture manuals from Intel. [3]”

Volume 1 Basic Architecture: This manual describes the architecture and


programming environment. It is useful for helping you understand how
memory works, including registers, memory layout, addressing, and the stack.
This manual also contains details about general instruction groups. [3]

Volume 2A Instruction Set Reference, A–M, and Volume 2B: Instruction Set
Reference, N–Z: These are the most useful manuals for the malware analyst.
They alphabetize the entire instruction set and discuss every aspect of each
instruction, including the format of the instruction, opcode information, and
how the instruction impacts the system. [3]

Volume 3A System Programming Guide, Part 1, and Volume 3B System


Programming Guide, Part 2: In addition to general-purpose registers, x86 has
many special-purpose registers and instructions that impact execution and
support the OS, including debugging, memory management, protection, task
management, interrupt and exception handling, multiprocessor support, and
more. If you encounter special-purpose registers, refer to the System
Programming Guide to see how they impact execution. [3]

Optimization Reference Manual: This manual describes code-optimization


techniques for applications. It offers additional insight into the code generated
by compilers and has many good examples of how instructions can be used in
unconventional ways. [3]
So basically the next step to expanding your knowledge of Assembly language is by
the help of Intel manuals: Intel 64 and IA-32 Architectures Software Developer’s
Manual Combined, and Intel 64 and IA-32 Architectures Optimization Reference
Manual .

- Compiler
When you are reverse engineering a malware, you will eventually come upon a
section of code that doesn’t make any sense to you. But if you knew how compilers
work, and how they translate your code to machine language, then you could really
make sense of what’s happening. Practical Reverse Engineering book recommend
these two books:

Compilers: Principles,Techniques, and Tools by Alfred V. Aho, Monica S.


Lam, Ravi Sethi, Jeffrey D. Ullman

Linkers and Loaders by John R. Levine

And for advance people, Practical Reverse Engineering recommends

Advanced Compiler Design and Implementation by Steven Muchnick

- Network
You now know the basic of how network operates. The next step is to be able to use
it in your malware analysis. Practical Packet Analysis book by Chris Sanders
will make you efficient in using Wireshark to analyze malware network activity.

- Obfuscation
Malware uses obfuscation to make the analysis of malware harder. So some
background knowledge in this matter is essential:

Surreptitious Software: Obfuscation, Watermarking, and


Tamperproofing for Software Protection by Christian Collberg and Jasvir
Nagra will provide this background knowledge.

For the following sections I think Practical Malware Analysis book sums up really
well why you need to expand your knowledge in these fields, “Most malware
targets Windows platforms and interacts closely with the OS. A solid
understanding of basic Windows coding concepts will allow you to identify
host-based indicators of malware, follow malware as it uses the OS to execute code
without a jump or call instruction, and determine the malware’s purpose.” [4]
- PE
The Portable Executable (PE) format is the file format that Windows operating
systems uses to run programs. So as a malware analyst, you need to be familiar
with this format.

The PE Header (section 4.3.2.1.1) in page 144 of The Art of Computer Virus
Research and Defense book by Peter Szor, and “Headers” section in page 97–
102 of Reversing: Secrets of Reverse Engineering by Eldad Eilam are
excellent resources to learn about PE format as a malware analyst. Also the picture
of PE structure in Portable Executable page from Wikipedia website is a great
reference, if you want to have a visual understanding of it. And if you want to study
further then I recommend:

An In-Depth Look into the Win32 Portable Executable File Format,


Part 1

An In-Depth Look into the Win32 Portable Executable File Format,


Part 2

- Operating System
After learning about operating system concepts in malware 101, Windows
Internalsbooks by Mark E. Russinovich, David A. Solomon, and Alex
Ionescu part 1 and part 2are the next best option to learn about Windows OS.
The part 2 of the 7th edition is not published yet, so stick with the 6th edition.

Also What Makes It Page? by Enrico Martignetti teaches you how virtual
memory manager works behind the scenes in Windows OS. [6] And in my opinion
is a supplementary book for Windows Internals books.

- Win32 Programming
When you want to learn about kernel in Windows, then you need to have some
knowledge of win32 programming to understand how to use Windows
API. Windows System Programming by Johnson M. Hart and Windows via
C/C++ by Jeffrey Richter, Christophe Nasarre are the best references to learning
win32 programming.

- Kernel
The program that controls the communication between software, and hardware in
your computer system. Some malware take advantage of Kernel in order to be
more stealthy, and persistent. Kernel knowledge is a must, if you want to advance
in your career as a malware analyst. Windows NT Device Driver
Development by Peter G. Viscarola and W. Anthony Mason is a book on driver
development, but the background chapters provide an excellent and concrete
introduction to Windows, and it is also an excellent supplementary material for the
Windows kernel. [6]

Windows Kernel Programming by Pavel Yosifovich needs a mention here,


because it is a modern take on windows kernel programming.

Malware 102 Diagram

created by draw.io

- Finally!
At last we are here :). The juicy parts, where the real fun starts. I make this part
short and sweet, and I’m only going to show you the first steps. The rest is up to
you, and there are some good resources, and materials on how to study further in
this field.
- Malware
1- Practical Malware Analysis book by Michael Sikorski, Andrew Honig is the
first book that you need to study about malware analysis, because it teaches you
everything from ground up. And familiarize you with general techniques, and tools
that you need to know for analyzing malware.

2- Download some malware sample and try to analyze them. You could
try malware.lu, but there are other websites as well. You just need to look for it
(Google is your friend).

3- Malware Analyst’s Cookbook by Michael Ligh, Steven Adair, Blake


Hartstein, Matthew Richard is the second book that you need to study.
Ms. MalwareUnicorn says, “This book is a great starter for understanding malware
from the RE perspective and creating tools to help you RE.” [7]. In my opinion
“Practical Malware Analysis” is more beginner friendly than this book, even
thought both of these books are introductory books about malware analysis.

- Reverse Engineering (RE)


You will spend most of your time, analyzing binaries (mostly Assembly language).
So you must have reverse engineering skills.

1- begin.re (created by Ms. Harpaz) is a great website for anyone that wants to get
started with Reverse Engineering.

2- Reversing: Secrets of Reverse Engineering book by Eldad Eilam is the


next best thing that you need to study to get better in RE. Windows XP is required
to analyze the binaries along this book. And The book’s materials weren’t available
in the mentioned website. You can download them from here.

3- Ms. MalwareUnicorn’s Reverse Engineering 101 and 102 workshops are great
to practice your newly found skills.

4- Flare-on challenges by FireEye company, which will be held each year is a


great way to test your skills. Also you can read the solutions for the previous years
on their website.

- General
Art of Computer Virus Research and Defense by Peter Szor, is a book about
virus threats, defense techniques, and analysis tools. Mr. Wosar says, “is one of the
very few books that looks specifically into how anti-viruses work. While it is a bit
older and slightly outdated, the techniques explained in that book are still in use
today.”[8]

Gray Hat Python by Justin Seitz is the book that teaches you the first steps on
how to use Python in your malware analysis.