Automated Asm Analysis PDF

Software fingerprinting for automated
assembly code analysis

Project overview
P. Charland
DRDC – Valcartier Research Centre
Defence Research and Development Canada

Scientific Report
DRDC-RDDC-2015-R027
March 2015
© Her Majesty the Queen in Right of Canada, as represented by the Minister of National Defence, 2015
© Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale,
2015
Abstract ……..
With the revolution in information technology, the dependence of the Canadian Armed Forces
(CAF) on their information systems continues to grow. While information systems-based assets
confer a distinct advantage, they also make the CAF vulnerable if adversaries interfere with those.
Unfortunately, the technology required to disrupt and damage an information system through
malicious software (malware) is far less sophisticated and expensive than the amount of
investment required to create the system. To understand and mitigate this threat, reverse
engineering has to be performed to analyze malware. However, software reverse engineering is a
manually intensive and time-consuming process. The learning curve to master it is quite steep and
once mastered, the process is hindered when anti-reverse engineering techniques are used. This
results in the very few available reverse engineers being quickly saturated. This Scientific Report
describes new approaches to accelerate the reverse engineering process of malware. The goal is to
reduce redundant analysis efforts by automating the identification of code fragments which
reuse (i) previously analyzed assembly code or (ii) open source code publicly available.
Significance to defence and security
With today’s proliferation of malware, new analysis techniques which go beyond existing best
practices are needed to accelerate the reverse engineering process. The intended audience for this
report are thus software reverse engineers and malware analysts, as well as managers and decision
makers of organizations which have to reverse engineer third party code. The techniques
presented in this report will contribute to provide the CAF with a core capability to respond more
effectively and rapidly to cyber threats and lead to a change in their defence capabilities. To
defend against cyber attacks, the CAF currently rely on antivirus and intrusion detection systems
(IDS). These approaches consist of looking for symptoms such as increased network traffic or
anomalous signals, and try to fight the problems that arise. Although they can be effective in
some cases, they do not help in understanding what the real problem is and are ineffective when
attacks morph. Providing a new capability for malware reverse engineering, as it is proposed in
this report, will allow the CAF to uncover vulnerabilities that malware exploit. This will also give
them the required knowledge to generate defences against future malicious attacks to pre-empt
them, even when they morph. This new capability will introduce a paradigm shift in the practices
of the CAF in the cyber environment. They will move from a strictly defensive approach, which
has shown its limits, to a proactive approach. This proposed approach has also security potential
for the Royal Canadian Mounted Police (RCMP), the Communications Security Establishment
Canada (CSEC), the Canadian Security Intelligence Service (CSIS), and Public Safety (PS) Canada.
DRDC-RDDC-2015-R027 i
This page intentionally left blank.
ii DRDC-RDDC-2015-R027
Résumé ……..
Avec la révolution des technologies de l'information, la dépendance des Forces armées

canadiennes (FAC) sur leurs systèmes d'information ne cesse de croître. Bien que les systèmes
d’information sont des actifs qui confèrent un net avantage aux FAC, ils les rendent également
vulnérables si des adversaires interfèrent avec ceux-ci. Malheureusement, la technologie
nécessaire pour perturber et endommager un système d'information par l’entremise de logiciels
malveillants est beaucoup moins sophistiquée et onéreuse que le montant de l'investissement
nécessaire à la création du système. Pour comprendre et atténuer cette menace, la rétro-ingénierie
doit être effectuée afin d'analyser ces logiciels malveillants. Cependant, la rétro-ingénierie
logicielle est un processus manuel et fastidieux. La courbe d'apprentissage est assez abrupte et
une fois maîtrisé, le processus est entravé lorsque des techniques pour contrer la rétro-ingénierie
sont utilisées. Cela se traduit par le fait que les très rares rétro-ingénieurs disponibles sont
rapidement saturés. Le présent rapport scientifique décrit de nouvelles approches pour accélérer le
processus de rétro-ingénierie de logiciels malveillants. L'objectif est de réduire les efforts
d'analyse redondants en automatisant l'identification de fragments de code qui réutilisent (i) du
code assembleur analysé précédemment ou (ii) du code source ouvert accessible à tous.
Importance pour la défense et la sécurité
Avec la prolifération actuelle des logiciels malveillants, de nouvelles techniques d'analyse allant
au-delà des meilleures pratiques existantes de rétro-ingénierie sont nécessaires pour accélérer le
processus. L’audience visée par ce rapport sont donc les rétro-ingénieurs logiciels et analystes de
maliciels, ainsi que les gestionnaires et décideurs d’organisations qui ont à faire la rétro-
ingénierie de code tiers. Les techniques présentées dans ce rapport contribueront à fournir aux
FAC une capacité de base pour répondre plus efficacement et rapidement aux menaces
cybernétiques et mènera à un changement dans leurs capacités de défense. Pour se prémunir
contre les cyber-attaques, les FAC comptent actuellement sur des antivirus et des systèmes de
détection d'intrusions. Ces approches consistent à chercher des symptômes, tels qu’une
augmentation du trafic réseau ou des signes anormaux, et essayer de lutter contre les problèmes
qui surgissent. Bien qu'elles puissent être efficaces dans certains cas, elles n'aident pas à
comprendre ce qu’est le véritable problème et sont inefficaces lorsque les attaques se
transforment. Fournir une nouvelle capacité de rétro-ingénierie de logiciels malveillants, tel que
proposée dans le présent rapport, permettra aux FAC de découvrir les vulnérabilités que les
logiciels malveillants exploitent. Ceci leur fournira aussi les connaissances requises afin de
générer des mesures défensives contre de futures attaques malveillantes afin de les contrer, même
lorsqu’elles se transforment. Cette nouvelle capacité introduira un changement de paradigme dans
les pratiques des FAC dans le cyber-environnement. Celles-ci passeront d'une approche
strictement défensive, dont les limites ont été exposées, à une approche proactive. L’approche
proposée a aussi un potentiel pour la sécurité, notamment la Gendarmerie royale du Canada
(GRC), le Centre de la sécurité des télécommunications Canada (CSTC), le Service canadien du
renseignement de sécurité (SCRS) et la Sécurité publique (SP) Canada.
DRDC-RDDC-2015-R027 iii
iv DRDC-RDDC-2015-R027
Table of contents
Abstract …….. ................................................................................................................................. i

Significance to defence and security ................................................................................................ i
Résumé …….. ................................................................................................................................ iii
Importance pour la défense et la sécurité ....................................................................................... iii
Table of contents ............................................................................................................................. v
List of figures ................................................................................................................................. vi
List of tables .................................................................................................................................. vii
1 Introduction ............................................................................................................................... 1
2 Assembly to source code matching ........................................................................................... 3
2.1 RE-Google ....................................................................................................................... 3
2.2 Open source code repositories ......................................................................................... 3
2.2.1 Black Duck Open Hub Code Search ................................................................... 4
2.2.2 GitHub ................................................................................................................. 4
2.3 Exact matching ................................................................................................................ 5
2.4 Close matching .............................................................................................................. 10
2.5 Contextual matching ...................................................................................................... 12
3 Code clone detection ............................................................................................................... 14
3.1 Assembly code clone detection ..................................................................................... 14
3.2 Assembly code clone search .......................................................................................... 15
3.2.1 Type I clones ..................................................................................................... 15
3.2.2 Type II clones .................................................................................................... 16
3.2.3 Type III clones .................................................................................................. 16
3.2.4 Type IV clones .................................................................................................. 17
3.3 Exact and inexact code clones ....................................................................................... 17
3.4 Code clone search .......................................................................................................... 19
4 Conclusions and future work .................................................................................................. 21
References ..... ............................................................................................................................... 23
List of symbols/abbreviations/acronyms/initialisms ..................................................................... 27
DRDC-RDDC-2015-R027 v
List of figures
Figure 1: Black Duck Open Hub Code Search results .................................................................... 4

Figure 2: GitHub results .................................................................................................................. 5
Figure 3: Citadel sub_42514F function in IDA Pro ........................................................................ 6
Figure 4: Sert.cpp file excerpt from GitHub.................................................................................... 7
Figure 5: Sert.cpp file excerpt on GitHub ....................................................................................... 9
Figure 6: Assembly code listing of a malicious executable .......................................................... 10
Figure 7: channels.c file excerpt on the Black Duck Open Hub Code Search .............................. 11
Figure 8: Comments in the channels.c file on the Black Duck Open Hub Code Search ............... 12
Figure 9: Video capture capability discovered in Citadel ............................................................. 13
Figure 10: Type I clone example ................................................................................................... 15
Figure 11: Type II clone example ................................................................................................. 16
Figure 12: Type III clone example ................................................................................................ 16
Figure 13: Type IV clone example ................................................................................................ 17
Figure 14: Exact clone detected in both Citadel (left) and Zeus (right) ........................................ 18
Figure 15: Inexact clone detected in Citadel (left) and Zeus (right).............................................. 19
Figure 16: Zeus code fragment ...................................................................................................... 20
Figure 17: Citadel code fragment .................................................................................................. 20
vi DRDC-RDDC-2015-R027
List of tables
Table 1: Correspondance between assembly and source code ........................................................ 8

Table 2: Source vs. assembly code clones ..................................................................................... 17
DRDC-RDDC-2015-R027 vii
viii DRDC-RDDC-2015-R027
1 Introduction
Canadians – individuals, industry and government – are embracing the many advantages that
cyberspace offers, and the economy and quality of life are the better for it. But their increasing
reliance on cyber technologies makes them more vulnerable to those who attack Canada’s digital
infrastructure to undermine the national security, economic prosperity, and way of life [1]. The
first decade of the 21st century saw a dramatic change in the nature of cybercrime. Once the
province of teenage boys spreading graffiti for kicks and notoriety, hackers today are organized,
financially motivated gangs [2]. In the past, virus writers displayed offensive images and bragged
about the malware they had written; now hackers target companies and governments to steal
intellectual property, build complex networks of compromised computers, and rob individuals of
their identities [2]. It is a common scenario that the only piece of evidence of a targeted attack is
the malicious executable code itself. Analyzing malicious code (malware) requires reverse
engineering, as malware authors seldom do the favor of providing the source code of
their creations.
Software reverse engineering is a manually intensive and time-consuming process, whose

objective is to determine the functionality of a program. It consists in taking a program’s
executable binary, translating it into assembly code, and then manually analyzing the resulting
assembly code. Most of the steps involve translating assembly code into a series of abstractions
that represent the overall flow of a program to determine its functionality. The learning curve to
master reverse engineering is quite steep but once mastered, the process is hindered when a
program is obfuscated by anti-reverse engineering techniques or actively tries to avoid detection,
as most malware do.
During the last few years, the sophistication of malware has considerably evolved and has thus
complicated the reverse engineering process. While malware used to consist of small programs
written mostly in assembly, which spread by infecting other executable files, today’s malware
programs are written using high-level languages, come in many forms (e.g., botnets, rootkits,
malicious document files), and each new variant improves on the previous ones, by adding new
capabilities and fixing bugs. Also, as developing stealthy and persistent malware requires a high
degree of technical complexity, it is quite common for code fragments to be reused between
different malware.
The fact that malware authors share source code among them [3, 4], have adopted a versioning
approach, and use evasion techniques to bypass antivirus detection have resulted in a proliferation
of malware. According to Panda Security, in 2013 alone, there were 30 million new malware
strains in circulation, at an average of 82,000 per day [5]. This results in the very few available
reverse engineers being quickly saturated, as in-depth malware analysis is a tedious and time-
consuming task. Reverse engineers should thus leverage the code reuse in the production of
malware and be able to correlate different malware programs to identify the similarities between
them and thereby, the code fragments they share. This would prevent reverse engineers from
reanalyzing the code fragments of a new malware, which have already been analyzed in a
previous context and therefore, reduce redundant analysis efforts.
Although software reverse engineering came to an age in 1990 [6], it is only in recent years that
its importance and visibility have arisen. However, despite a growing community, it is still
DRDC-RDDC-2015-R027 1
perceived by many as a dark art. This report will thus serve to illustrate our vision of how, by
leveraging the open source code repositories publicly available and assembly code fragments
previously analyzed, the entry barrier new analysts face could be reduced.
The remainder of this Scientific Report is organized as follows: Chapter 2 presents the assembly
to source code matching process and its associated usage scenarios. Chapter 3 provides
background information on code clone detection, the research area on which the identification of
reused code fragments is built, as well as the different types of assembly code clones which
should be detected by the proposed approach. Finally, Chapter 4 provides conclusions and
future work.
2 DRDC-RDDC-2015-R027
2 Assembly to source code matching
Malware authors reuse code from all sources including legitimate ones, such as open source code.
For example, both the Conficker worm and the Waledac bot used an open source implementation
for their cryptographic functions [7, 8]. In the case of the Waledac bot, this represented 25% of its
code [8]. Also, the much-discussed FLAME malware contained publicly available open source
code packages such as SQLite and LUA [9].
Another example of source code reuse for malware creation is the Citadel Trojan. Citadel is based
on the leaked source code of the Zeus crimeware kit [10]. Other malware (e.g., Gameover Zeus,
Ice IX, LICAT, and Murofet) were also created after the source code of Zeus was made
public [10].
2.1 RE-Google
Some people have compared reverse engineering to solving a jigsaw puzzle [11]. You first start
by finding the corner pieces, then the frame, and after that, you work your way forward from
there. Using this analogy, the corner pieces for reverse engineering are strings, constants, and
function names. Strings contain human readable hints about a given functionality. Specific
constants can give additional clues and can sometimes be used to even identify certain types of
algorithms. Function names of imported functions from shared libraries (e.g., DLL) can reveal
information about the performed actions. However, a lot of experience is needed to know, for
example, the significance of a constant in a given context or what the combination of imported
functions might results in.
The meaning of strings, constants, and function names could be obtained by searching them on
publicly available open source code repositories. This was the idea behind the RE-Google IDA
Pro plug-in [11], which has proved to be very valuable to find algorithms and code excerpts
containing such information on Google Code. The ability to efficiently recognize the open source
origin for a given assembly code fragment is desirable, both in order to enhance the productivity
of a reverse engineer, as well as to reduce the odds of common libraries leading to false
correlation between otherwise unrelated code bases.
RE-Google [11], a proof of concept plug-in for the IDA Pro disassembler [12], extracts constants,
library names, and strings contained in a disassembled binary, and uses them to search for code
on Google Code [13]. The links to the top ten source code files found are inserted as comments
into the assembly code listing. Reviewing these files frequently provides enlightening insights
into the functionality of the code fragment in question and saves considerable time. However,
RE-Google uses the Google Code Search Data Application Programming Interface (API), which
is no longer available, making this plug-in non-functional.
2.2 Open source code repositories

In order to be able to automatically match assembly with source code, the following alternatives
to the Google Search service were evaluated: Antelink [14], CodePlex [15], GrepCode [16],
GitHub [17], Krugle [18], the Open Hub Code Search [19], and searchcode [20]. The two selected
options were the Open Hub Code Search and GitHub.
2.2.1 Black Duck Open Hub Code Search
The Open Hub Code Search, by Black Duck Software [21], with its 21,372,664,482 lines of open
source code, claims to be the world's largest and most comprehensive code search engine [19]. It
is the result of the merge in 2012 with Koders, another open source code search engine acquired
by Black Duck Software in 2008.
Through the Open Hub Code Search, Black Duck Software wants to fill the gap left by the
shutdown of Google Code Search in 2012. Figure 1 shows a screenshot of the results for the
search term sort filtered for the C++ programming language.
Figure 1: Black Duck Open Hub Code Search results.
2.2.2 GitHub
GitHub is a web-based hosting service for software development projects using Git [22] at its
heart. Git is a free and open source distributed version control system. GitHub claims to be the
largest code host on the planet with over 18.9 million repositories [23]. One advantage that
GitHub has over the Open Hub Code Search is its robust API, which allows the integration of
third-party tools or applications [23]. Figure 2 displays a snapshot of the returned results by
GitHub, considering only the C++ projects, for the search term sort.
Figure 2: GitHub results.
2.3 Exact matching

The perfect scenario in assembly to source code matching is when the source code of an assembly
code function is found on a public repository. This is illustrated in the following example, where
the corresponding source code of the Citadel Trojan function sub_42514F (Figure 3) has been
found on GitHub (Figure 4).
Figure 3: Citadel sub_42514F function in IDA Pro.
Figure 4: Sert.cpp file excerpt from GitHub.
Figure 4 displays a subset of the file Sert.cpp found on the GitHub Carberp1 repository. All the
calls to the Windows API functions (coloured in pink) in Figure 3 are also present in Figure 4, as
shown in Table 1. The presence of the letter W appended at the end of the function name
CertOpenSystemStoreW in the disassembly and not in the source code listing is due to the fact
that there are two versions of CertOpenSystemStore. One for ASCII strings (ending with an A)
and one for Unicode (ending with a W). In the present case, the CertOpenSystemStore function
was compiled for Unicode.
1
https://github.com/hzeroo/Carberp/blob/6d449afaa5fd0d0935255d2fac7c7f6689e8486b/source%20-
%20absource/pro/all%20source/sert/sert.cpp
Table 1: Correspondence between assembly and source code.
IDA Pro virtual Source code line
Function name
address number
CertOpenSystemStoreW 0042515A 172
CertEnumCertificatesInStore 00425167 176
CertDuplicateCertificateContext 00425173 178
CertDeleteCertificateFromStore 0042517E 180
CertCloseStore 00425192 183
Figure 3 shows that IDA Pro was also able to extract the string “MY” at the address 00425150,
which is passed as a parameter to the function CertOpenSystemStoreW. This string is also
present in the file Sert.cpp. It is initialized at line 51 (Figure 5) and its pointer is passed as a
parameter to the CertOpenSystemStore function at line 172 (Figure 4).
This example illustrates how being able to match assembly with its corresponding source code
greatly accelerates the reverse engineering process. The latter is at a higher level of abstraction
and is thus easier to understand. In this case, it serves to clearly illustrate how the different
Windows Certificate Store functions used for cryptography and present in Citadel are related.
Figure 5: Sert.cpp file excerpt on GitHub.
2.4 Close matching
With close matching, although the exact corresponding source code cannot be found on a public
repository, the fact that certain assembly code features (e.g., strings, constants) are matched can
provide additional information. This can sometimes reveal the performed actions of a function.
This is best illustrated with the following example. Figure 6 displays the assembly code listing of
a malicious Secure Shell (SSH) client. IDA Pro was able to extract the string x11-req at the
address 806721A. This string was also found on the Black Duck Open Hub Code Search, in the
channels.c2 file of the MirOS Project.
Figure 6: Assembly code listing of a malicious executable.

2
http://code.openhub.net/file?fid=h271Oe3rYxXkxgFY3ZUnlSzqaXc&cid=rWOJw-
YTJwg&s=&pp=0&fp=371636&ff=1&filterChecked=true&mp=1&ml=1&me=1&md=1#L0
Figure 7: channels.c file excerpt on the Black Duck Open Hub Code Search.
The MirOS project is a secure operating system from the BSD family for 32-bit i386 and SPARC
systems [24]. Its file channel.c comes from the OpenBSD project [25]. The fact that the string
x11-req was retrieved on the Black Duck Open Hub Code Search in the channel.c file allows
the reverse engineer to know that the disassembled code displayed in Figure 6 is used for
initiating X11connection forwarding. This is mentioned in the comments at the beginning of the
file (Figure 8).
Figure 8: Comments in the channels.c file on the Black Duck Open Hub Code Search.
2.5 Contextual matching

Contextual matching is the process of characterizing the disassembled code under study by
pairing it with some known source code. In this case, although the exact or closely corresponding
source code cannot be found, searching on open source code repositories can still provide
information about the performed actions of a function. This is what happened for Citadel when an
approximate code matching identified a video-related capability. Although the matching process
was not perfect, it was accurate enough to reveal the context of the function. The video capture
capability of Citadel was unleashed through links to source code files on the Black Duck Open
Hub Code Search such as MHRecordContol.h, stopRecord.c, trackerRecorder.h,
signalRecorder.h, and waitRecord.c. The links found for the files were added as
comments in the disassembly of Citadel, as shown in Figure 8. This observation was further
supported by the fact that the API de-obfuscation of Citadel revealed the presence of strings such
as _startRecord16. Also, a video_start command was also found as part of the process.
Figure 9: Video capture capability discovered in Citadel.
Although contextual matching is far from being the ideal scenario, the high-level information it
provides for a function saves the reverse engineer from manually analyzing it. With Citadel
having approximately 800 functions, any function for which the reverse engineer will not have to
deal with assembly code analysis is a time saver.
3 Code clone detection
During the last few years, the sophistication of malware has considerably evolved and has thus
complicated the reverse engineering process. While malware used to consist of small programs
written mostly in assembly, which spread by infecting other executable files, today’s malware
programs are written using high-level languages, come in many forms (e.g., botnets, rootkits,
malicious document files), and each new variant improves on the previous ones, by adding new
capabilities and fixing bugs. Also, as developing stealthy and persistent malware requires a high
degree of technical complexity, it is quite common for code fragments to be reused between
different malware.
The fact that malware authors share source code among them [1, 4], have adopted a versioning
approach, and use evasion techniques to bypass antivirus detection have resulted in a proliferation
of malware. Since retrieving the open source origin of a malware code fragment is not always
possible, reverse engineers should thus leverage the code reuse in the production of malware and
be able to correlate different malware programs to identify the similarities between them and
thereby, the code fragments they share. This would prevent them from reanalyzing the code
fragments of a new malware, which have already been analyzed in a previous context.
The problem of correlating different code fragments is closely related to the research area of
clone detection. Clone detection is a technique to identify duplicate code fragments in a code
base. Traditionally, it has been used to decrease the code size by consolidating it and thus,
facilitate program comprehension and software maintenance. This need stems from the fact that
reusing code fragments by copying and pasting them, with or without modifications, is a common
scenario in software development and can be detrimental to software maintenance and evolution.
For example, if a bug is found in a code fragment, then all similar code fragments must also be
verified for the presence of this bug.
As clone detection is an important problem, it has been studied extensively and numerous clone
detection algorithms exist. Depending on the code level analysis used, they can be classified
within the following categories: text-based [26, 27, 28], token-based [29, 30, 31, 32], tree-based
[33, 34, 35], metrics-based [36, 37], and graph-based [38, 39, 40]. However, most existing clone
detection algorithms operate on source code and these solutions are not directly applicable to
assembly code. One important application of clone detection on binary code is the detection of
copyright infringements. For example, closed source software should not contain open source
code released under the GNU General Public License (GPL).
3.1 Assembly code clone detection

Applying clone detection to the problem of malware analysis is challenging, due to the evasion
techniques used by malware authors to produce syntactically different executable code, but
semantically performing the same malicious functionality. This is further complicated by the fact
that clone detection on assembly code is not yet a mature research area. While the research
community has expended effort in different approach categories, namely text-based [41, 42],
token-based [43, 44, 45], metrics-based [46, 47], structural-based [48, 49, 50, 51, 52, 53],
behavioral-based [54], and hybrid-based approaches [55], there is a lack of experimental data to
validate their precision, recall, and scalability. Also, unlike source code clone detection, no
independent experimental study has been conducted to compare the performance of the assembly
code clone detection methods.
3.2 Assembly code clone search

The objective of clone detection is to identify code fragments of high similarity from a large code
base. The major challenge is that the clone detector usually does not know beforehand which
code fragments may be repeated. Therefore, a naïve clone detection approach might need to
compare every pair of code fragments. Such a comparison is prohibitively expensive in terms of
computation and is infeasible to perform in many real-life scenarios. But given a collection of
previously analyzed assembly files and a target assembly code fragment, such as in the case of
malware analysis, the objective is not to identify all the duplicate code fragments. It is only to
identify all the code fragments in the previously analyzed assembly files that are similar to the
target fragment. This problem is known as assembly code clone search.
A code fragment is any sequence of assembly code instructions, with or without comments, at any
granularity level (e.g., function, basic block). A code fragment is a clone of another code
fragment if they are similar according to a given definition of similarity [56]. In clone detection
(or search), code fragments can be similar based on their program text (textual similarity) or
functionality (functional similarity). In the literature, code clones have been classified into
Type I, II, III (textual similarity), and IV (functional similarity) [57]. The results of clone
detection take the form of clone pairs. A pair of code fragments is called a clone pair if there
exists a clone-relation between them (i.e., a clone pair is a pair of code fragments which are
identical or similar to each other) [57].
3.2.1 Type I clones
A Type I clone is when two or more code fragments are identical except for variations in
whitespace, layout, and comments. In the example of Figure 10, the only difference between the
two code fragments is the presence of the Memory comment indicated in red at line 1.
1 push eax ; Memory push eax
2 call ds:_aligned_free call ds:_aligned_free
3 and dword ptr [esi], 0 and dword ptr [esi], 0
4 pop ecx pop ecx
Figure 10: Type I clone example.
3.2.2 Type II clones
Type II clones are structurally and syntactically identical code fragments except for variations in
identifiers, literals, types, layout, and comments. In Figure 11, the only difference between the
two code fragments is that for some instructions, they use different constants, variable names, and
labels. For example, for the assembly code instruction at line 5, the instruction on the left uses the
constant 24h and the variable name var_C, while its corresponding instruction on the right uses
the constant 20h and the variable name InBuffer. Similar differences also apply for the
instructions at line 15. For the jnz instruction at line 17, the loc_10001A97 label is used on the
left, while the loc_10001493 label is used for the instruction on its right.
1 push edi ; Size push edi ; Size
2 call _malloc call _malloc
3 mov edx, eax mov edx, eax
4 mov ecx, edi mov ecx, edi
5 mov [esp+24h+var_C], edx mov [esp+20h+InBuffer], edx
6 mov edi, edx mov edi, edx
7 mov edx, ecx mov edx, ecx
8 xor eax,eax xor eax, eax
9 shr ecx, 2 shr ecx, 2
10 rep stosd rep stosd
11 mov ecx, edx mov ecx, edx
12 add esp, 4 add esp, 4
13 and ecx, 3 and ecx, 3
14 rep stosb rep stosb
15 mov eax, [esp+20h+var_C] mov eax, [esp+1Ch+InBuffer]
16 test eax, eax test eax, eax
17 jnz loc_10001A97 jnz loc_10001493
18 mov eax, [ebx] mov eax, [ebx]
19 push eax push eax
Figure 11: Type II clone example.
3.2.3 Type III clones
A Type III clone is a Type II clone with further modifications. Statements can be changed, added,
or removed, in addition to variations in identifiers, literals, types, layout and comments. In the
example of Figure 12, the order of the two instructions at lines 3 and 4 was inverted.
1 mov esi, [ebp+arg_0] mov esi, [ebp+arg_0]
2 mov edx, [esi+214h] mov edx, [esi+214h]
3 mov edi, [esi+220h] mov [ebp+var_4], edx
4 mov [ebp+var_4], edx mov edi, [esi+220h]
5 cmp [esi+21Ch], edi cmp [esi+21Ch], edi
6 jl short loc_76641044 jl short loc_76641044
7 lea ebx, [edx+edi*8] lea ebx, [edx+edi*8]
Figure 12: Type III clone example.
3.2.4 Type IV clones
A Type IV clone, also known as a semantic clone, occurs when two or more code fragments
perform the same computation, but using different syntactic variants. In the example of Figure 13,
the two code fragments carry out the same function, i.e., compute the length of a string. However,
as it can be seen, their implementation differs significantly.
1 strlen1 proc near strlen3 proc near
2
3 arg_0 = dword ptr 4 arg_0 = dword ptr 4
4
5 mov eax, [esp+arg_0] push edi
6 mov edi, [esp+4+arg_0]
7 loc_401004: xor ecx, ecx
8 cmp byte ptr [eax], 0 not ecx
9 jz short done xor al, al
10 inc eax cld
11 jmp short loc_401004 repne scasb
12 not ecx
13 done: lea eax, [ecx-1]
14 sub eax, [esp+arg_0] pop edi
15 retn retn
16 strlen1 endp strlen3 endp
Figure 13: Type IV clone example.
3.3 Exact and inexact code clones

The above definitions for the different code clones types are commonly used in the literature for
source code clone detection [57]. However, they are not directly applicable to assembly code. For
example, although possible, Type I clones seldom occur in assembly code and are thus irrelevant.
For this reason and to simplify matters, the notion of exact and inexact code clones is introduced.
As illustrated in Table 2, an exact clone corresponds either to a Type I or Type II clone, while an
inexact clone corresponds to a Type III or Type IV clone.
Table 2: Source vs. assembly code clones.

Source Code Assembly Code
Type I Clone
Exact Clone
Type II Clone
Type III Clone

Inexact Clone
Type IV Clone
The first usage scenario which should be supported is the detection of exact clones. Figure 14
displays an example of an exact clone detected in both Citadel (on the left) and Zeus (on the
right), with their differences highlighted. When compared with their corresponding instructions in
Zeus, some Citadel instructions use different registers (e.g., ebx and edi at address 40ADEE and
40ADEF), labels (e.g., loc_40B135 at address 40ADFA), and function names (e.g., sub_433D74
at address 40AE59).
Figure 14: Exact clone detected in both Citadel (left) and Zeus (right).
The second usage scenario to be supported is illustrated in Figure 15. In this example, an inexact
clone was identified in both Citadel (on the left) and Zeus (on the right). This clone is related to
the RC4 function used for encrypting the command and control (C&C) network traffic between
the bot and the C&C server.
Figure 15: Inexact clone detected in Citadel (left) and Zeus (right).
3.4 Code clone search

Another scenario which should also be supported is the possibility to search for a specific code
fragment. For example, an analyst might be interested to know if the function at the virtual
address 417018 in Zeus (Figure 16) is also present in Citadel (Figure 17).
Figure 16: Zeus code fragment.
Figure 17: Citadel code fragment.
4 Conclusions and future work
As software reverse engineering is a manually intensive and time-consuming process, this report
illustrates scenarios on how to accelerate it by partially automating some of its aspects. The
objective is to leverage existing sources of information available, namely (i) public open source
code repositories and (ii) previously analyzed assembly code fragments. The first approach aims
at saving time by providing the significance of a function’s strings, constants, and imported
functions, without having the reverse engineer analyze the underlying assembly code. The second
attempts to reduce redundant analysis efforts by detecting code clones of a target executable.
Future work will consist of developing prototypes to support the different scenarios explained in
the present report. These prototypes will allow matching assembly with its corresponding source
code, as well as detecting clones of assembly code fragments. Empirical studies will also be
conducted to measure their precision, recall, scalability, and performance using malware samples.
The results will be measured against the most representative approaches to compare assembly code.
References .....
[1] Public Safety Canada, Canada’s Cyber Security Strategy: For a Stronger and More
Prosperous Canada, Government of Canada, 2010.
[2] Sophos, Security Threat Report: 2010, Sophos Group, Abingdon, United Kingdom,
Jan. 2010.
[3] PSTP08-0107eSec Combating Robot Networks and Their Controllers: A Study for the Public
Security and Technical Program (PSTP), CPQ0229.1.36.4, Bell Canada, 2010.
[4] D. Babic, D. Reynaud, and D. Song, “Malware Analysis with Tree Automata Inference,”
Proc. of the 23rd Int’l Conf. on Computer Aided Verification (CAV ‘11), Snowbird, Utah,
Jul. 2011, pp. 116-131.
[5] Panda Security, “Annual Report PandaLabs: 2013 Summary,” 2014;

http://mediacenter.pandasecurity.com/mediacenter/wp-content/uploads/2014/07/Annual-
Report-PandaLabs-2013.pdf.
[6] E. Eilam, Reversing: Secrets of Reverse Engineering, Wiley Publishing, 2005.
[7] P. Porras, H. Saidi, and V. Yegneswaran, “An Analysis of Conficker,” Mar. 2009;
http://mtc.sri.com/Conficker/.
[8] B. Stock, et al., “Walowdac – Analysis of a Peer-to-Peer Botnet,” Proc. of the 2009 European
Conf. on Computer Network Defense (EC2ND '09), Milan, Italy, Nov. 2009, pp. 13-20.
[9] sKyWIper Analysis Team, sKyWIper (a.k.a. Flame a.k.a. Flamer): A Complex Malware for
Targeted Attacks, tech. report, Department of Telecommunications, Budapest Univ. of
Technology and Economics, 2012.
[10] Analysis Team, Malware Analysis: Citadel, tech. report, AhnLab Security Emergency
response Center, 2012.
[11] F. Leder, “RE-Google,” Accessed Jan. 2015; http://regoogle.carnivore.it/.
[12] Hex-Rays, “IDA: About,” Accessed Jan. 2015; https://www.hex-

rays.com/products/ida/index.shtml.
[13] Google, “Google Code,” Accessed Jan. 2015; http://code.google.com/.
[14] Antelink, “Antepedia Open Source Search Engine - Stay aware of updates & security issues
of open source projects,” Accessed Jan. 2015; http://www.antepedia.com/.
[15] Microsoft, “CodePlex - Open Source Project Hosting,” Accessed Jan. 2015;
https://www.codeplex.com/.
[16] GrepCode, “GrepCode.com - Java Source Code Search 2.0,” Accessed Jan. 2015;
http://grepcode.com/.
[17] GitHub, “GitHub · Build software better, together.,” Accessed Jan. 2015;
https://github.com/.
[18] Aragon Consulting Group, “Home | krugle - software development productivity,” Accessed
Jan. 2015; http://www.krugle.com/.
[19] Black Duck Software, “Open Hub Code Search,” Accessed Jan. 2015;
http://code.openhub.net/.
[20] B.E. Boyter, “searchcode | source code search engine,” Accessed Jan. 2015;
https://searchcode.com/.
[21] Black Duck Software, “Black Duck,” Accessed Jan. 2015;

https://www.blackducksoftware.com/.
[22] Git, “Git,” Accessed Jan. 2015; http://git-scm.com/.
[23] GitHub, “Features · GitHub,” Accessed Jan. 2015; https://github.com/features.
[24] Black Duck Software, “The MirOS Project Open Source Project on Open Hub,” Accessed
Jan. 2015; https://www.openhub.net/p/mirbsd.
[25] OpenBSD, “OpenBSD,” Accessed Jan. 2015; http://www.openbsd.org/.
[26] J.H. Johnson, “Identifying Redundancy in Source Code Using Fingerprints,” Proc. of the
1993 Conf. of the Centre for Advanced Studies on Collaborative Research: Software Eng.,
Toronto, Ont., Sept. 1993, pp. 171-183.
[27] A. Marcus and J.I. Maletic, “Identification of High-level Concept Clones in Source Code,”
Proc. of the 16th IEEE Int’l Conf. on Automated Software Eng. (ASE ‘01), Coronado, Calif.,
Nov. 2001, pp. 107-114.
[28] J. Ji, et al., “Source Code Similarity Detection Using Adaptive Local Alignment of
Keywords,” Proc. of the 8th Int’l Conf. on Parallel and Distributed Computing,
Applications and Technologies (PDCAT ‘07), Adelaide, Australia, Dec. 2007, pp. 179-180.
[29] B.S. Baker, “On Finding Duplication and Near-Duplication in Large Software Systems,”
Proc. of the 2nd Working Conf. on Reverse Eng. (WCRE ‘95), Toronto, Ont., Jul. 1995, pp.
86-95.
[30] T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A Multilinguistic Token-Based Code

Clone Detection System for Large Scale Source Code, IEEE Trans. on Software Eng., vol.
28, no. 7, Jul. 2002, pp. 654-670.
[31] H.A. Basit, et al., “Efficient Token Based Clone Detection with Flexible Tokenization,”
Proc. of the 6th Joint Meeting of the European Software Eng. Conf. and the ACM
SIGSOFT Symp. on the Foundations of Software Eng., Dubrovnik, Croatia, Sept. 2007,
pp. 513-516.
[32] B. Hummel, et al., “Index-based Code Clone Detection: Incremental, Distributed, Scalable,”
Proc. of the 2010 IEEE Int’l Conf. on Software Maintenance, Timisoara, Romania, Sept.
2010, pp. 1-9.
[33] I.D. Baxter et al., “Clone Detection Using Abstract Syntax Trees,” Proc. of the Int’l Conf.
on Software Maintenance (ICSM '98), Bethesda, Md., Nov. 1998, pp. 368-377.
[34] V. Wahler, et al., “Clone Detection in Source Code by Frequent Itemset Techniques,” Proc.
of the 4th IEEE Int’l Workshop on Source Code Analysis and Manipulation (SCAM ‘04),
Chicago, Ill., Sept. 2004, pp. 128-135.
[35] W.S. Evans, C.W. Fraser, and F. Ma, “Clone Detection via Structural Abstraction,” Proc. of
the 14th Working Conf. on Reverse Eng. (WCRE ’07), Vancouver, B.C., Oct. 2007,
pp. 150-159.
[36] K.A. Kontogiannis, et al., “Pattern Matching for Clone and Concept Detection,” J. of
Automated Software Engineering, vol. 3, no. 1-2, Jun. 1996, pp. 77-108.
[37] J. Mayrand, C. Leblanc, and E. Merlo, “Experiment on the Automatic Detection of Function
Clones in a Software System Using Metrics,” Proc. of the 1996 Int’l Conf. on Software
Maintenance (ICSM ’96), Monterey, Calif., Nov. 1996, pp. 244-253.
[38] J. Krinke, “Identifying Similar Code with Program Dependence Graphs,” Proc. of the 8th
Working Conf. on Reverse Eng. (WCRE ’01), Stuttgart, Germany, Oct. 2001, pp. 301-309.
[39] R. Komondoor and S. Horwitz, “Using Slicing to Identify Duplication in Source Code,”
Proc. of the 8th Int’l Symp. on Static Analysis (SAS 2001), Paris, France, Jul. 2001,
pp. 40-56.
[40] C. Liu, et al., “GPLAG: Detection of Software Plagiarism by Program Dependence Graph
Analysis,” Proc. of the 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data
Mining (KDD ’06), Philadelphia, Pa., Aug. 2006, pp. 872-881.
[41] J. Jang and D. Brumley, BitShred: Fast, Scalable Code Reuse Detection in Binary Code,
tech. report CMU-CyLab-10-006, CyLab, Carnegie Mellon Univ., 2009.
[42] J. Jang, D. Brumley, and S. Venkataraman, BitShred: Fast, Scalable Malware Triage, tech.
report CMU-CyLab-10-022, CyLab, Carnegie Mellon Univ., 2010.
[43] M.E. Karim, et al., “Malware Phylogeny Generation using Permutations of Code,” J. in
Computer Virology, vol. 1, no. 1, 2005, pp. 13-23.
[44] A. Schulman, “Finding Binary Clones with Opstrings and Function Digests, Doctor Dobb's
J., vol. 30, no. 9, Sept. 2005, pp. 64-70.
[45] A. Walenstein, et al., “Exploiting Similarity Between Variants to Defeat Malware: “Vilo”
Method for Comparing and Searching Binary Programs,” Proc. of BlackHat DC 2007.
[46] D. Bruschi, L. Martignoni, and M. Monga, “Detecting Self-mutating Malware Using

Control-flow Graph Matching,” Proc. of the 3rd Int’l. Conf. on Detection of Intrusions and
Malware & Vulnerability Assessment (DIMVA '06), Berlin, Germany, Jul. 2006,
pp. 129-143.
[47] A. Saebjornsen, et al., “Detecting Code Clones in Binary Executables,” Proc. of the 18th
Int’l Symp. on Software Testing and Analysis (ISSTA '09), Chicago, Ill., Jul. 2009,
pp. 117-128.
[48] E. Carrera and G. Erdélyi, “Digital Genome Mapping - Advanced Binary Malware
Analysis,” Proc. of the Virus Bulletin Conf., Chicago, Ill., Sept. 2004, pp. 187-197.
[49] Halvar Flake, “More Fun with Graphs,” Black Hat Federal, 2003.
[50] M.E. Karim, et al., “Malware Phylogeny Generation using Permutations of Code,” J. in
Computer Virology, vol. 1, no. 1, 2005, pp. 13-23.
[51] I. Briones and A. Gomez, “Graphs, Entropy and Grid Computing: Automatic Comparison
of Malware,” Proc. of the Virus Bulletin Conf., Ottawa, Ont., Oct. 2008, pp. 187-197.
[52] S.S. Anju, et al., “Malware Detection Using Assembly Code and Control Flow Graph
Optimization,” Proc. of the 1st Amrita ACM-W Celebration on Women in Computing in
India (A2CWiC '10), Tamilnadu, India, Sept. 2010, pp. 65:1-65:4.
[53] T. Dullien, et al., “Automated Attacker Correlation for Malicious Code,” Proc. of the NATO
RTO Information Assurance and Cyber Defence, Tallinn, Estonia, Dec. 2010.
[54] P.M. Comparetti, et al., “Identifying Dormant Functionality in Malware Programs,” Proc.
of the 2010 IEEE Symp. on Security and Privacy (SP ’10), Oakland, Calif., May 2010,
pp. 61-76.
[55] Z. Wang, K. Pierce, and S. McFarling, “BMAT - A Binary Matching Tool for Stale Profile
Propagation,” J. of Instruction-Level Parallelism, vol. 2, 2000.
[56] C.K. Roy, J.R. Cordy, and R. Koschke, “Comparison and Evaluation of code Clone
Detection Techniques and Tools: A Qualitative Approach,” Science of Computer
Programming, vol. 74, no. 7, May 2009, pp. 470-495.
[57] C.K. Roy and J.R. Cordy, A Survey on Software Clone Detection Research, tech. report
2007-541, School of Computing, Queen's Univ., Kingston, Ont., 2007.
List of symbols/abbreviations/acronyms/initialisms
API Application Programming Interface

BSD Berkeley Software Distribution
CAF Canadian Armed Forces
C&C Command and Control
CSEC Communications Security Establishment Canada
CSIS Canadian Security Intelligence Service
CSTC Centre de la sécurité des télécommunications Canada
DLL Dynamic Link Library
DRDC Defence Research and Development Canada
FAC Forces armées canadiennes
GNU GNU's Not Unix!
GPL General Public License
GRC Gendarmerie royale du Canada
IDS Intrusion Detection System
PS Public Safety
RCMP Royal Canadian Mounted Police
RDDC Recherche et développement pour la défense Canada
SPARC Scalable Processor Architecture
SCRS Service canadien du renseignement de sécurité
SP Sécurité publique
SSH Secure Shell
DOCUMENT CONTROL DATA
(Security markings for the title, abstract and indexing annotation must be entered when the document is Classified or Designated)
1. ORIGINATOR (The name and address of the organization preparing the document. 2a. SECURITY MARKING
Organizations for whom the document was prepared, e.g. Centre sponsoring a (Overall security marking of the document including
contractor's report, or tasking agency, are entered in section 8.) special supplemental markings if applicable.)
Defence Research and Development UNCLASSIFIED

Canada – Valcartier Research Centre
2459, Route de la Bravoure
Quebec (Quebec) 2b. CONTROLLED GOODS
G3J 1X5 Canada

(NON-CONTROLLED GOODS)
DMC A
REVIEW: GCEC DEC 2012
3. TITLE (The complete document title as indicated on the title page. Its classification should be indicated by the appropriate abbreviation (S, C or U) in
parentheses after the title.)
Software fingerprinting for automated assembly code analysis: Project overview (U)
4. AUTHORS (last name, followed by initials – ranks, titles, etc. not to be used)
Charland, P
5. DATE OF PUBLICATION 6a. NO. OF PAGES 6b. NO. OF REFS
(Month and year of publication of document.) (Total containing information, (Total cited in document.)
including Annexes, Appendices,
etc.)
March 2015
40 57
7. DESCRIPTIVE NOTES (The category of the document, e.g. technical report, technical note or memorandum. If appropriate, enter the type of report,
e.g. interim, progress, summary, annual or final. Give the inclusive dates when a specific reporting period is covered.)
Scientific Report
8. SPONSORING ACTIVITY (The name of the department project office or laboratory sponsoring the research and development – include address.)
Defence Research and Development Canada – Valcartier Research Centre

2459, de la Bravoure Road
Quebec (Quebec)
G3J 1X5 Canada
9a. PROJECT OR GRANT NO. (If appropriate, the applicable research 9b. CONTRACT NO. (If appropriate, the applicable number under
and development project or grant number under which the document which the document was written.)
was written. Please specify whether project or grant.)
15bz18
10a. ORIGINATOR'S DOCUMENT NUMBER (The official document 10b. OTHER DOCUMENT NO(s). (Any other numbers which may be
number by which the document is identified by the originating assigned this document either by the originator or by the sponsor.)
activity. This number must be unique to this document.)
DRDC-RDDC-2015-R027
11. DOCUMENT AVAILABILITY (Any limitations on further dissemination of the document, other than those imposed by security classification.)
Unlimited
12. DOCUMENT ANNOUNCEMENT (Any limitation to the bibliographic announcement of this document. This will normally correspond to the
Document Availability (11). However, where further distribution (beyond the audience specified in (11) is possible, a wider announcement
audience may be selected.))
Unlimited
13. ABSTRACT (A brief and factual summary of the document. It may also appear elsewhere in the body of the document itself. It is highly desirable that
the abstract of classified documents be unclassified. Each paragraph of the abstract shall begin with an indication of the security classification of the
information in the paragraph (unless the document itself is unclassified) represented as (S), (C), (R), or (U). It is not necessary to include here abstracts in
both official languages unless the text is bilingual.)
With the revolution in information technology, the dependence of the Canadian Armed Forces
(CAF) on their information systems continues to grow. While information systems-based assets
confer a distinct advantage, they also make the CAF vulnerable if adversaries interfere with
those. Unfortunately, the technology required to disrupt and damage an information system
through malicious software (malware) is far less sophisticated and expensive than the amount of
investment required to create the system. To understand and mitigate this threat, reverse
engineering has to be performed to analyze malware. However, software reverse engineering is
a manually intensive and time-consuming process. The learning curve to master it is quite steep
and once mastered, the process is hindered when anti-reverse engineering techniques are used.
This results in the very few available reverse engineers being quickly saturated. This Scientific
Report describes new approaches to accelerate the reverse engineering process of malware. The
goal is to reduce redundant analysis efforts by automating the identification of code fragments
which reuse (i) previously analyzed assembly code or (ii) open source code publicly available.
---------------------------------------------------------------------------------------------------------------
Avec la révolution des technologies de l'information, la dépendance des Forces armées

canadiennes (FAC) sur leurs systèmes d'information ne cesse de croître. Bien que les systèmes
d’information sont des actifs qui confèrent un net avantage aux FAC, ils les rendent également
vulnérables si des adversaires interfèrent avec ceux-ci. Malheureusement, la technologie
nécessaire pour perturber et endommager un système d'information par l’entremise de logiciels
malveillants est beaucoup moins sophistiquée et onéreuse que le montant de l'investissement
nécessaire à la création du système. Pour comprendre et atténuer cette menace, la rétro-
ingénierie doit être effectuée afin d'analyser ces logiciels malveillants. Cependant, la rétro-
ingénierie logicielle est un processus manuel et fastidieux. La courbe d'apprentissage est assez
abrupte et une fois maîtrisé, le processus est entravé lorsque des techniques pour contrer la
rétro-ingénierie sont utilisées. Cela se traduit par le fait que les très rares rétro-ingénieurs
disponibles sont rapidement saturés. Le présent rapport scientifique décrit de nouvelles
approches pour accélérer le processus de rétro-ingénierie de logiciels malveillants. L'objectif est
de réduire les efforts d'analyse redondants en automatisant l'identification de fragments de code
qui réutilisent (i) du code assembleur analysé précédemment ou (ii) du code source ouvert
accessible à tous.
14. KEYWORDS, DESCRIPTORS or IDENTIFIERS (Technically meaningful terms or short phrases that characterize a document and could be helpful
in cataloguing the document. They should be selected so that no security classification is required. Identifiers, such as equipment model designation,
trade name, military project code name, geographic location may also be included. If possible keywords should be selected from a published thesaurus,
e.g. Thesaurus of Engineering and Scientific Terms (TEST) and that thesaurus identified. If it is not possible to select indexing terms which are
Unclassified, the classification of each should be indicated as with the title.)
Software reverse engineering; malware analysis; assembly to source code matching; clone
detection

Automated Asm Analysis PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Automated Asm Analysis PDF

Загружено:

Авторское право:

Доступные форматы

Software fingerprinting for automated

assembly code analysis

Defence Research and Development Canada

Significance to defence and security

Avec la révolution des technologies de l'information, la dépendance des Forces armées

Importance pour la défense et la sécurité

Abstract …….. ................................................................................................................................. i

Figure 1: Black Duck Open Hub Code Search results .................................................................... 4

Table 1: Correspondance between assembly and source code ........................................................ 8

Software reverse engineering is a manually intensive and time-consuming process, whose

2.2 Open source code repositories

2.2.1 Black Duck Open Hub Code Search

Figure 1: Black Duck Open Hub Code Search results.

Figure 2: GitHub results.

2.3 Exact matching

CertOpenSystemStoreW 0042515A 172

CertEnumCertificatesInStore 00425167 176

CertDuplicateCertificateContext 00425173 178

CertDeleteCertificateFromStore 0042517E 180

CertCloseStore 00425192 183

Figure 6: Assembly code listing of a malicious executable.

2.5 Contextual matching

3.1 Assembly code clone detection

3.2 Assembly code clone search

3.2.1 Type I clones

3.2.3 Type III clones

3.3 Exact and inexact code clones

Table 2: Source vs. assembly code clones.

Type III Clone

3.4 Code clone search

Figure 17: Citadel code fragment.

[5] Panda Security, “Annual Report PandaLabs: 2013 Summary,” 2014;

[6] E. Eilam, Reversing: Secrets of Reverse Engineering, Wiley Publishing, 2005.

[11] F. Leder, “RE-Google,” Accessed Jan. 2015; http://regoogle.carnivore.it/.

[12] Hex-Rays, “IDA: About,” Accessed Jan. 2015; https://www.hex-

[13] Google, “Google Code,” Accessed Jan. 2015; http://code.google.com/.

[21] Black Duck Software, “Black Duck,” Accessed Jan. 2015;

[22] Git, “Git,” Accessed Jan. 2015; http://git-scm.com/.

[23] GitHub, “Features · GitHub,” Accessed Jan. 2015; https://github.com/features.

[25] OpenBSD, “OpenBSD,” Accessed Jan. 2015; http://www.openbsd.org/.

[30] T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A Multilinguistic Token-Based Code

[46] D. Bruschi, L. Martignoni, and M. Monga, “Detecting Self-mutating Malware Using

API Application Programming Interface

Defence Research and Development UNCLASSIFIED

G3J 1X5 Canada

Defence Research and Development Canada – Valcartier Research Centre

Avec la révolution des technologies de l'information, la dépendance des Forces armées

Вам также может понравиться