Вы находитесь на странице: 1из 37

Basics of Reverse Engineering for x86

Table of Contents

Table of Contents
Basics of Reverse Engineering for x86....................................................................................... 1
Numbers and numbering systems........................................................................................... 4
Integers (whole numbers) versus floating-point (fractional numbers) ........................... 4
Numbering systems............................................................................................................. 4
Number ranges.................................................................................................................... 4
Negative values................................................................................................................... 5
Basic Computer Architecture................................................................................................... 5
Registers.............................................................................................................................. 5
Memory................................................................................................................................ 6
Status flags.......................................................................................................................... 6
Floating point stack.............................................................................................................. 7
Program code.......................................................................................................................... 7
Concepts and tools............................................................................................................... 7
Assembler code.................................................................................................................... 8
Mathematical operations on integer values......................................................................... 8
Mathematical operations on floating-point values............................................................... 9
Logical operations................................................................................................................ 9
Bitwise operations......................................................................................................... 9
Bit rotation..................................................................................................................... 9
Data movement instructions................................................................................................ 9
Stack operations............................................................................................................... 9
Moving a byte, word, dword............................................................................................ 10
Widening data (bytes into words, words into dwords) ..................................................... 10
Moving larger amounts of data....................................................................................... 10
Direction flag............................................................................................................... 11
Optimization................................................................................................................ 11
Working with pointers..................................................................................................... 11
LEA math..................................................................................................................... 12
Comparison instructions.................................................................................................... 12
Conditions for signed values........................................................................................... 12
Conditions for unsigned values....................................................................................... 13
Code flow instructions........................................................................................................ 13
Jumps.............................................................................................................................. 13

Function calls.................................................................................................................. 13
Passing arguments to a function.................................................................................. 14
Expendable and non-expendable registers.................................................................. 14
Function result............................................................................................................. 14
Function return address............................................................................................... 15
Function environment.................................................................................................. 15
Indirect function calls................................................................................................... 15
Examples..................................................................................................................... 16
Higher-level constructs expressed in assembler................................................................... 17
C constructs....................................................................................................................... 17
Arrays............................................................................................................................. 17
Strings......................................................................................................................... 17
Allocation..................................................................................................................... 17
Constants........................................................................................................................ 17
Pointer math................................................................................................................... 17
Structs............................................................................................................................ 18
Conditional statements................................................................................................... 18
Labels.......................................................................................................................... 18
If.................................................................................................................................. 18
If-else........................................................................................................................... 18
If-elseif-else.................................................................................................................. 19
Loops.............................................................................................................................. 19
While loop.................................................................................................................... 19
Do-while loop............................................................................................................... 20
for loop........................................................................................................................ 20
Switch statements....................................................................................................... 21
Tightly grouped cases with multiple different actions ............................................... 21
Tightly grouped cases with few different actions ...................................................... 22
Ungrouped cases...................................................................................................... 23
Varargs............................................................................................................................ 24
Additional C++ constructs................................................................................................. 25
Constructors and destructors.......................................................................................... 25
Heap memory.............................................................................................................. 25
Independently allocated memory / placement new..................................................... 25
Local variable allocation.............................................................................................. 25
Enumerators................................................................................................................... 26
Methods.......................................................................................................................... 26
Inheritance...................................................................................................................... 26
Virtual functions.............................................................................................................. 26

COM................................................................................................................................ 28
Templates........................................................................................................................ 28
Appendix A: Syringe.............................................................................................................. 28
Appendix B: Security considerations..................................................................................... 29
String overflows................................................................................................................. 30
Integer overflows............................................................................................................... 30
Use of uninitialized values.................................................................................................. 31
Appendix C: analysing gamemd.exe..................................................................................... 32
Reading the program code in IDA Pro ................................................................................. 32
How-tos........................................................................................................................... 32
Appendix D: reading crash dumps from the game................................................................ 33

Numbers and numbering systems


Integers (whole numbers) versus floating-point (fractional numbers)
We dont see a big difference between 3 and 3.1 when we do math, they both are simple to
understand and use. Computers, however, make a significant distinction between numbers
with no fractional part (called integers) and numbers that contain a fractional part (called
floating-point numbers, in reference to the decimal point). These two groups differ
significantly in how they are stored and operated on.
Numbering systems
A numbering system is a way of expressing numbers using specific symbols. Numbering
systems are divided into symbolic (a good example is Roman numerals, where I is always 1
and X is always 10) and positional, where the value of 1 differs depending on where in the
number it is placed, e.g. in 123 1 has more value than in 213. This is the system we use
most commonly in our everyday lives, and we use it without even thinking about the actual
meanings of the symbols. For most people, the number 10 is equivalent to the number that
comes after 9. This is known as the Decimal numbering system, or base-10, called so
because there are 10 distinct digits (0-9), and therefore we can represent 10 distinct numbers
with a single digit. Math has no problem with defining and operating in different bases, and
when dealing with computers, binary (base-2) and hexadecimal (base-16) numbering systems
are a lot more convenient.
A base-n numbering system has n distinct symbols denoting numbers from 0 to (n-1). Larger
numbers require more than one symbol to be expressed. In general, each digit in the number
has n times more value than the digit to the right of it, which means basically: in a base-n
number, a digit d in the pth position (counting from the right, starting with 1) stands for (d
n(p -1)). The sum of all the digits in a number calculated this way is the real value of the
number. For example, in base-10 a number 152 is equal to 2 100 + 5 101 + 1 102.
Notice that a number 10 in any base-n system equals the n.
A number written in base-n is usually denoted with a subscripted n suffix: base-8 numbers are
followed by an 8, base-16 numbers by 16, and so on. Lack of such a suffix usually indicates
base-10. The computing world has additional special expressions for base-2 (so-called
binary), base-8 (octal) and base-16 (hexadecimal) systems. Base-2 numbers can be
suffixed with simply a b, base-8 numbers can be prefixed with a 0 or suffixed with an o,
and base-16 numbers can carry the prefix of 0x or a suffix of h.
Base-10 uses the well-known symbols 0 through 9 to express numbers from 0 to 9.
Base-n, when n is less than 10, also starts with 0 and uses the same symbols to express
numbers from 0 to (n-1). For example, base-8 uses 0 through 7. A number 64 in base-8
equals 4 80 + 6 81 = 52.
Base-n, when n is more than 10, obviously needs more symbols to denote numbers from 10 to
(n-1). A common choice for this is using A, B, C and so on until the required number of
symbols is reached. (Lets not ask what comes after Z.) For example, base-16 uses 0
through 9 followed by A through F to express numbers from 0 to 15. A number 9A in
base-16 equals A 160 + 9 161 = 10 + 144 = 154. Do notice that 9h + 1h = Ah = 10, but
its not 10h.
While the examples above covered only integers, math allows you to extend the same rules to
the fractional parts, but in this case the leftmost position after the decimal point is position -1,
the one to the right is -2, and so on. But in the computing world floating point numbers are

usually written down in base-2 and stored in special binary formats which are irrelevant for
now.
Number ranges
When performing math, we can equally easily calculate 10 1 and 1000000000000 1.
Computers, on the other hand, do not have such luxury and are designed to operate on fixed
number ranges effectively. Operating outside those ranges is possible with special libraries,
but it tends to be slower. Certain higher-level programming languages abstract these
complications away from the programmer and always use the slower way since its more
correct and less surprising, but thats not the languages were interested in.
X86, like most other computers today, operates on binary numbers. A single binary digit can
only have one of two values, 0 or 1. Bits are usually grouped into groups of 8. 8 bits can
represent unsigned numbers between 0 and 2 8-1 = 255, or signed values between -128 and
127. A value with 8 bits is called a byte. Two bytes (16 bits) make up a word. (Technically, the
term word used to refer to machine word, which is a value that a machine can operate on
in one step without breaking it down to smaller pieces.) A word can represent signed values
between 0 and 216-1 = 65535, or signed values between -32768 and 32767. Four bytes (32
bits) are called a dword (double word) and can represent values between 0 and 2 32-1 =
4294967295, or between -2147483648 and 2147483647. Eight bytes (64 bits) are called a
qword (quad word) and can represent values between 0 and 2 64-1.
One hexadecimal digit can represent 4 binary digits, so a byte needs only 2 hexadecimal
digits instead of 8 binary ones. For this reason, hexadecimal representation is much more
popular. A hexadecimal digit is also called a nibble.
Negative values
Representation of negative values is more complex than simply summing up all digits values,
because the computer doesnt have a separate place to store the signedness of the value.
Instead, it is up to the code to know whether to interpret a given value as a signed or an
unsigned number. If the value is signed, the highest (rightmost) bit is used as a sign indicator
and not as part of the value. If the value is negative, the values of all the bits except the sign
bit are inverted, and 1 is added to the resulting value, yielding the absolute value of the
number that is being represented. The representation of non-negative values within the range
[0; 01111] is the same in signed and unsigned modes. In unsigned mode, bit patterns from
0000 to 1111 represent values in correct increasing order. In signed mode, bit patterns
from 000..0 to 011..1 represent 0 and positive values in increasing order, 1000 represents
the smallest representable negative value and 1111 represents -1.

Basic Computer Architecture


Put simply, a computer consists of a CPU which performs operations, registers that are very
fast to access but contain very little amounts of data, a certain amount of RAM that can be
read and written somewhat slower than the registers but has a lot more space, extra registers
for floating point values, advanced registers used by modern processor extensions such as
MMX and SSE, and external specialized hardware such as hard drives, video cards and sound
cards. Well be focusing on an Intel-x86 system in 32-bit mode.
Registers
A Register is basically a very small storage space that can be accessed very quickly.
In 32-bit mode, the system has several registers:

General purpose registers, called EAX, EBX, ECX, EDX.


Additional registers called ESI, EDI, and EBP, that have a specific purpose but can be easily
used as general registers as well.
A Stack Pointer register called ESP (stack will be explained later).
An Instruction Pointer register called EIP. This differs from other registers in that it cannot be
actually modified explicitly.
A register containing status flags describing the current processor state, called EFLAGS.
All registers can store 32-bit values. The general purpose registers have convenient aliases to
access the lower 16 bits of the register, as well as (only EAX-EDX) aliases for the 8 lower and 8
higher bits of those 16.
This diagram displays the register EAX, its aliases and their sizes. This also applies to registers
EBX through EDX. Registers ESI, EDI and EBP have 16 bit aliases called SI, DI and BP
respectively, which behave same as AX does.
31

16 15
EAX

8 7

AX
AH

AL

Memory
In 32-bit mode, a process (program) can use 4 Gigabytes of memory (since it can only address
that much in 32 bits). In certain operating systems, additional limitations and workarounds
apply. Note that the operating system doesnt let the program access raw memory, but
instead lets it operate on Virtual Memory. So even if the actual computer has less than 4
Gigabytes of physical memory, the program will not have any problems. In fact, the OS lets
each program have its own 4 Gigabytes without interfering with other programs.
This memory space contains everything the program needs to run. That includes the actual
program code, the data it operates on, and the operating system code necessary to run the
program. Free memory that the program can use is called the Heap, and usually dynamically
allocated memory comes from there. A specific area of the memory is reserved for a Stack,
which contains small amounts of data that is modified very often, including the programs
current call stack (what function called the current one, what function called that, etc.).
Assuming you are familiar with C++, you should know that the Stack is where functions
automatic local variables are allocated, whereas global variables, static variables and
manually allocated variables (managed through new/delete) are placed on the Heap.
The Stack operates in a pretty simple manner normally the last value you put on it is the
first one you will get out of it. Whats not so obvious is that it starts at a fairly high memory
address and grows towards lower ones.
The ESP register always contains the address of the topmost value of the stack. Therefore
adding a new value on the stack decrements the ESP value by the size of that value (which is
typically 32 bits), and removing it increases it by the same amount.
Status flags
The status flags pseudo-register contains several logical indicators, such as overflow
happened, carry value, signed value, and zero value that are updated by most
mathematical operations. These indicators are often used to make logical decisions (if-else

logic). It also contains some indicators, such as direction, which is used by a repeated
instruction to decide whether to operate on increasing or decreasing addresses.
The precise layout of this pseudo-register is outlined in Intel's reference manuals (Volume I,
page 3-20) and any other reference, and because it isn't really relevant for us, it will not be
repeated here. I believe a list of most commonly used flags will be sufficient:
Carry Flag (CF) if it's set, the last operation resulted in an unsigned integer overflow.
Overflow Flag (OF) if it's set, the last operation resulted in a signed integer overflow.
Zero Flag (ZF) set if the result is zero.
Sign Flag (SF) set to the value of the sign bit of the result when treated as a signed integer
if the result is negative, this will be 1, otherwise it will be 0.
Direction Flag (DF) controls direction of certain looping constructs, will be explained later.
Note that most mathematical operations such as addition or subtraction evaluate their result
in both signed and unsigned modes and set all the related flags appropriately.
Floating point stack
The system contains 8 80-bit registers that are used to store floating-point values for a very
short time. This collection of registers is treated as a pseudo-stack, which operates and is
addressed in a slightly different manner from the stack referenced by ESP. The FPU status
flags register tracks which of the 8 registers currently contains the top of the stack.
Adding/removing a new value to the stack changes this registers value. The registers do not
have any fixed names, when the code needs to refer to them, the topmost register is called st
or st(0), the next one is called st(1) and so on. Notice that after adding a new value to the
stack, the register that was st(0) becomes st(1), and so on. Removing a value operates in the
reverse direction.
A register similar to EFLAGS exists to indicate similar attributes, but it cannot be accessed
directly and special instructions must be used to copy its value to the AX register before using
it. The precise layout of this pseudo-register is outlined in Intel's reference manuals (Volume I,
page 8-10). There are three useful flags in this register:
the C0 flag - equivalent to the Carry Flag, indicated by AH & 0x1.
the C2 flag equivalent to the Parity Flag, AH & 0x4.
the C3 flag equivalent to the Zero Flag, AH & 0x40.

Program code
Concepts and tools
On the lowest level, a programs code, like any other data, is a sequence of bytes that the
CPU interprets as instructions. These numeric instructions have textual mnemonics that are
easier to read and understand. An instruction performs a very small task, such as moving a
value from one register to another, or multiplying it with a different value.
An assembler takes the mnemonics, turns them into the sequence of bytes and places them
in an executable. Examples of these tools include MASM (Microsoft Assembler), NASM and
GAS (GNU Assembler).

A compiler takes code written in a high-level language, such as C++, and transforms it into
assembler code, which is just a fancy name for these same mnemonics. The mnemonics are
then typically fed to the assembler. Most popular C++ compilers for x86 are Microsofts Visual
C++ (known as MSVC) and GNUs G++.
A tool called a disassembler takes the sequence of bytes and displays the corresponding
mnemonic sequence. However, the information that was contained in the original high-level
code, such as actual operations, variable names, function names, data types, etc. is not saved
in the executable and therefore cannot be recovered. What the disassembler gives you is a
very low-level view of what the program does, like value from this address is read, multiplied
by 3 and placed into that address. This naturally takes more lines to express things that take
just a few lines in high-level code. Obviously, it makes the code more difficult to understand.
Popular disassemblers are IDA Pro, OllyDbg and Linuxs objdump d. More powerful
disassemblers contain helpful functionality like automatically adding empty lines around
logical sections (e.g., before/after the loop iteration code).
A debugger lets you inspect a running process step by step you can make the program stop
after each operation, inspect its variables and observe the code flow. This is very helpful when
trying to figure out a bug in the program, or a myriad other scenarios. Any decent
development environment for a higher level language includes a debugger for that language,
and typically that allows you to see corresponding assembler code as well. Virtually every
disassembler includes a debugger, as both those tools are very useful and complement each
other really well. Debuggers can operate either in user level (as a normal program, where
they can debug ordinary programs) or kernel level, where they can debug even the operating
system and the drivers. For example, OllyDbg and IDA Pro both contain powerful user-level
debuggers. SoftICE (now known as NuMega Driver Studio) is an incredibly powerful kernellevel debugger. Microsofts Debugging Tools for Windows package contains several debuggers
for both user- and kernel- level debugging. Unix developers typically use GDB.
If a program was built in debug mode, it typically contains the symbols (data structures,
types, variables) to simplify debugging. Most debuggers and disassemblers can typically
make good use of these to make the disassembled code more readable.
When a program crashes, the operating system or the programs own crash handling code
can make a memory snapshot (process dump) of the whole program and save it. Certain
debuggers and disassemblers can read these snapshots, so the developers can try and figure
out what caused the crash.
A decompiler attempts to go one step further than the disassembler and turn the sequence of
bytes or equivalent assembler code back into high-level code, usually C. Due to previously
mentioned limitations, this process heavily depends on the ability to identify the original
compiler that was used and the decompilers familiarity with that compilers code generation.
Even then, generated code is not going to look much like original code, but it is still
immensely helpful when reverse engineering code. The most famous decompiler is IDA Pros
companion Hex-Rays.
Assembler code
Assembler (mnemonics for CPU instructions) is what a reverse engineer spends most of
his/her time looking at, so being familiar with them is very important.
There are two main syntaxes for x86 assembler, one created by Intel and widely used in the
Windows environment, and another created by AT&T and preferred in Unix environments.
Microsoft Visual C++ only supports the Intel syntax, whereas GCC defaults to AT&T syntax but

has a flag to use Intel syntax instead. In my opinion, Intels syntax is more readable, but that's
a matter of opinion of course.
In both syntaxes, an instruction begins with optional prefixes, followed by the instruction
code, followed by optional arguments. In Intel syntax, the first argument is destination and
any following ones are source, while AT&Ts syntax is more mundane and puts the destination
argument at the end. AT&Ts syntax also uses very different expressions for arguments, which
we will not go deeply into.
X86 has a large amount of instructions, not all of which are equally useful. The best reference
for these is the Intel IA-32 Programmer Reference multi-volume book set, available as free
PDFs from Intels website. It used to also be available as free dead-tree versions, but they
dont do that any more. Its hard to overstate the importance of having these for easy
reference.
Manually writing a complete program in assembler is not often done these days. Manual
assembly used to be reserved for small performance-critical code blocks, but modern
compilers can often generate better code than humans. Instruction set extensions such as
MMX or SSE contain instructions that can be used to parallelize operations and perform other
tasks faster, but the C/C++ compilers do not always have enough information in the source
code to determine what extensions would be safe to apply so they play it safe this is an area
where manual code generation can still yield better performance.
For similar reasons, this document will show small snippets of assembly code, but not entire
programs.
Mathematical operations on integer values
Mathematical operations usually take two operands, apply the operation and put it into the
destination operand. Abstractly, it looks like this: dst = dst src, where represents the
mathematical operation.
Adding two numbers is done simply with an ADD instruction: ADD EAX, EBX. Same for
subtraction: SUB EBX, EAX.
Multiplication is a little trickier, since it can result in much larger values than either operand. It
also yields highly different results depending on the signedness of the operands. Therefore
there are two instructions to perform multiplication, MUL and IMUL, and their result is 64 bits
wide, its top dword is put into EDX and the bottom one into EAX. This is called a register pair
and written as EDX:EAX.
Division is more complex still. Again we have two instructions, DIV and IDIV. Note that these
perform integer division (5 / 2 = 2), to get floating point results you need the floating point
instructions.
Mathematical operations on floating-point values
Operations on floating-point values are usually named similar to the integer ones the
addition is performed by FADD, subtraction by FSUB... They follow the same pattern of dst =
dst src, but their operands are usually the floating point registers.

Logical operations
Bitwise operations
Bitwise operations (OR, XOR, AND...) follow the same operand/result rules as mathematical
ones.
OR EAX, 1h sets the lowest bit of EAX.
AND EAX, 1111b clears all bits of EAX except the bottom four.
NOT EAX flips each bit from 0 to 1 and vice versa, e.g. 0101b becomes 1010b.
NEG EAX changes the sign of the value, that is, if EAX contained 44, it will now contain -44.
This is done according to the rules described in the Negative values section.
Bit rotation
Sometimes it is useful to rotate the bits that represent a value in some way. For example, if
we move all of the values bits one position to the left, we will get a value that is exactly 2
times larger than the original value was. If we move them to the right instead, we will get a
value that is 2 times smaller. Other rotation methods exist, but they are useful under more
specific circumstances.
The SHL EAX, 2 instruction shifts all bits two positions to the left and sets the bottom two bits
to zero. This yields a value that is 22 larger than the original one was (this loses the values of
the top two bits, so we actually get a value that is twice larger than the 30 bottom bits of
EAX).
The SHR EAX, 2 instruction shifts all bits two positions to the right and fills the top two bits
with zero. This basically removes the sign, divides the value by 2 2 and drops the remainder.
The SAR EAX, 2 instruction shifts all bits two positions to the right and fills the top two bits
with the value of the original top bit. This is equivalent to SHR, but it retains the sign bit.
The ROL EAX, 3 instruction rotates the bits in EAX by three positions, the three bottom bits are
filled with the values of the three formerly topmost bits. This is useful in cryptographic
routines, but it has no simple mathematical representation.
The ROR EAX, 3 does the same as ROL, but to the right instead of left.
Usage of SHL/SHR used to be a fast way to perform multiplication/division by powers of two.
Nowadays, the processors are fast enough that the dedicated multiplication/division
instructions are just as adequate.
Data movement instructions
One of the key tasks a CPU has to do is move data from point A to point B, for example,
calculating X+Y and putting the result somewhere other code can find it.
Stack operations
The Stack uses some specific terminology pushing a value refers to adding a new value at
the top of the stack, and popping a value means the reverse, removal of the topmost value.
This terminology is reflected in the instruction names:
PUSH arg pushes the argument onto the stack (and decrements the stack pointer). In x86-32,
this usually means 32 bit operands.

POP reg moves the topmost value off the stack into the specified register and increments the
stack pointer.
Moving a byte, word, dword
Moving a small amount of data that fits in one register can be done by a single MOV
instruction.
For example, assembler code to move the value 42 into the EAX register would look like MOV
EAX, 42. Moving the value of ECX into EAX would be MOV EAX, ECX. Moving the DWORD from
memory address 0x102030 into EAX is MOV EAX, dword ptr ds:[102030h].
Widening data (bytes into words, words into dwords)
It is sometimes necessary to take a byte/word value and turn it into a word/dword.
The MOVZX (zero-extend) instruction is used to extend values by filling the unused bits with
zero, that is known as unsigned extension.
In contrast, the MOVSX (sign-extend) instruction fills unused bytes with the value of the
source operands sign bit. This is signed extension.
For example: assume the memory address 0x203040 contains 0x71 0x81 0x91 0x41. If we
consider these to be signed values, that is 113 -127 -111 65. If we consider them unsigned
though, they become 113 129 145 65.
MOVSX AX, byte ptr ds:[0x203042] puts 0xFF91 into AX (Highest bit of 0x91 is 1, so the 8 top
bits of AX are filled with 1). When viewed as a signed value, AX also contains -111. If you view
it as unsigned however, it becomes nonsensical 65425.
MOVSX AX, byte ptr ds:[0x203043], however, puts 0x0041 into AX, since the highest bit of
0x41 is 0.
MOVZX EAX, byte ptr ds:[0x203041] puts 0x00000081 into EAX. EAX now contains 129,
whether you view it as signed or unsigned.
MOVZX AX, byte ptr ds:[0x203043] puts 0x0041 into AX, just like MOVSX did.
The CDQ instruction can be used to extend a signed DWORD in EAX into the EDX:EAX register
pair. This is commonly used before multiplication/division.
Moving larger amounts of data
Moving larger amounts of data requires more instructions. Two key instructions for this are
MOVSD and MOVSB (move string in dwords/bytes). They both read the value of ESI,
interpret it as a memory address, read respectively a DWORD or a BYTE from that location
and put it into the address contained in EDI. Afterwards, they modify both ESI and EDI by the
amount of bytes copied. So, you can write five MOVSD in a row to copy 20 bytes.
However, there are helper prefix bytes to simplify loops. Prefix REP makes the instruction
repeat multiple times. Before running the instruction, the value in ECX is checked. If its not
zero, its decremented and the instruction is executed and ECX is tested again. This process
repeats again until ECX becomes zero. For example, MOV ECX, 40h; MOV ESI, 102030h; MOV
EDI, 202030h; REP MOVSD; will copy 40h * 4 = 100h bytes from the memory block starting at
102030h to the memory block starting at 202030h. After this operation, ECX will contain zero,
ESI will contain 102130h, and EDI will be 202130h.

Direction flag
The previous section glossed over a slight detail. As was mentioned in the EFLAGS section, the
EFLAGS has a so-called Direction Flag bit that indicates whether the MOVS* will increment or
decrement the ESI/EDI registers. It is usually not modified these days and typically set to
increment, but you can modify it if you want. Why would you? A fairly common scenario is
copying data between two overlapping memory ranges to copy data from range
[0x1000...0x2000] to [0x18000x2800], you should begin at the higher address and copy
towards the lower one.
Optimization
What would be the fastest way to copy 66h bytes? MOVSD cannot copy that exact amount,
and a large amount of MOVSB instructions will be slow. The simple answer, and one that is
used by most compilers today, is to divide the work:
MOV EDX, 66h
MOV ECX, EDX
SHR ECX, 2 ; 66h / 4, dropping the remainder
REP MOVSD
MOV ECX, EDX
AND ECX, 3 ; 66h % 4 *
REP MOVSB
*This is a common trick when you need to divide by powers of two shifting the divisor right
by the appropriate power of two gives you the division result, and anding the divisor with the
divider minus one yields the remainder.
It should be noted that modern CPUs have instruction set extensions such as MMX and SSE,
which provide more interesting instructions. Some of them can be used to build an even faster
data copying function, but these extensions are outside the scope of this document.
Working with pointers
Pointers are very commonly used in assembler, much like in C/C++. Consider the example
from earlier:
MOVZX AX, byte ptr ds:[0x203043].
This instruction takes one byte from the memory location 0x203043 and operates on it.
Unless your function operates on the same data all the time, the address holding the
information you need might be different (suppose you have to read from a buffer that was
allocated dynamically to store user input, the memory location of the buffer would be different
each time). In this case, you cannot actually hardcode the 0x203043 into the code, can you?
Something has to tell you where the information you need is, and you need to operate on that
address. Lets say you know that the buffers starting address is currently in the EDX register,
and you need to sign-extend the first and fourth bytes from that buffer into AX and EBX
respectively. One possible way to do that is:
MOVSX AX, byte ptr ds:[EDX]
ADD EDX, 3 ; move three bytes ahead, from #1 to #4
MOVSX EBX, byte ptr ds:[EDX]
SUB EDX, 3
You have to restore the EDX value so that other code can use it correctly. A different approach
would be to use an instruction called LEA (nothing to do with Star Wars, sorry). LEA stands for
Load Effective Address, and it calculates the value of its source operand (the address it should
be pointing to) and writes that value into the destination operand. It doesnt read any data

from the pointer, it just calculates it. The useful part comes when you see how complex the
source operands can be:
LEA dst, [reg1 + x * reg2 + y]
Here, x and y can be any number between 0 and 7 inclusive. LEA is optimized to calculate the
results very quickly, so it can be used often and for unintended purposes. With knowledge of
this instruction, you can perform our task differently:
MOVSX AX, byte ptr ds:[EDX]
LEA EBX, [EDX + 3]
MOVSX EBX, byte ptr ds:[EBX]
This saves us the trouble of restoring a register value and is arguably easier to read.
LEA math
As noted earlier, thanks to its speed LEA used to often be used for purposes other than
pointer math. This is largely irrelevant nowadays, since processors are so much faster than
they were when this trick was invented, but you will need to know about it since its widely
used in existing programs. You can probably see how a compiler can use it to turn a complex
mathematical operation into a sequence of simpler ones, for example:
EAX = EDX * 210h can be implemented as a single multiply instruction, or as this sequence:
LEA
LEA
LEA
LEA
LEA

ECX,
ECX,
ECX,
EAX,
EAX,

[EDX+7*EDX] ; ECX = 8 * EDX


[ECX+7*ECX] ; ECX = 8 * (8 * EDX) = 40h * EDX
[ECX+7*ECX] ; ECX = 8 * (8 * (8 * EDX) ) = 200h * EDX
[EDX+7*EDX] ; EAX = 8 * EDX
[ECX+2*EAX] ; EAX = 200h * EDX + 2 * (8 * EDX) = 210h * EDX

Comparison instructions
The x86 has two ways to compare values mathematically (less, equal, more) and bitwise (is
bit # set). Both of them put results into the EFLAGS register. Certain other instructions behave
differently based on certain flags values. The most common example of a conditional
instruction would be the family of conditional jumps (Jxx). These instructions take one
operand, a memory address, and each instruction corresponds to a more-or-less commonly
used condition. These instructions check if the EFLAGS meets their condition, and tell the CPU
after this instruction, start executing code at < given memory address> if it does.
Otherwise, nothing happens. Note that this instruction will not save any information about
where it was before switching to <given memory address>, so you cant later tell the CPU
ok, now go back to what you were doing earlier.
Problem is that almost every instruction that calculates a mathematical result also writes the
results attributes to EFLAGS, so the code should be carefully designed to avoid using those
instructions between making the comparison and using the result flags. Documenting each
instructions result attributes here would be a lot of work, and that is much better covered in
Intels reference.
The TEST arg1, arg2 instruction performs the bitwise comparison. It basically ANDs the two
operands and sets the ZF, PF flags depending on the result.
The CMP arg1, arg2 instruction performs the mathematical comparison. It basically subtracts
arg2 from arg1 and sets the ZF, SF, CF, OF, PF flags according to the result, which is NOT
saved anywhere. The flags that should be checked depend on whether the compared values
are signed or not, and what condition we are testing for.

Code flow instructions


It is fairly reasonable to expect a program to contain some conditional logic, e.g. is the users
entered number larger than 250?, and to make two different things happen depending on the
answer. It is also reasonable to organize code into smaller functions that perform small, welldefined tasks. X86 has two methods to navigate from one point in the code to another Jump
and Call.
Jumps
Jumps can be unconditional (JMP) and conditional (Jxx family). No information about the
current address is saved before the CPU jumps to the requested location, so you cant return
back later. Most often, an unconditional jump is used at the end of the if-branch in an if-else
block.
Several conditional jump instructions exist, and most of them have several mnemonics to
more simply express the condition being tested.
Mnemonics

Condition to meet

Equality
JZ/JE

Jump if Equal /If Zero

JNZ/JNE

Jump if Not Equal/If Not Zero

Signed value comparison


JL/JNGE

Jump if Less/If Not Greater or Equal

JLE/JNG

Jump if Less or Equal/If Not Greater

JGE/JNL

Jump if Greater or Equal/If Not Less

JG/JNLE

Jump if Greater/If Not Less or Equal

Unsigned value comparison


JB/JNAE

Jump if Below/If Not Above or Equal

JBE/JNA

Jump if Below Or Equal/If Not Above

JAE/JNB

Jump if Above Or Equal/If Not Below

JA/JNBE

Jump If Above/If Not Below Or Equal

Other conditions
JS

Jump if Sign

JP

Jump if Parity

Function calls
Logical division of code into functions is really important for code quality, and any complex
program is bound to have large amounts of functions in it. This makes the function call
mechanism a very important part of x86 operations. The instruction to call a function is
unsurprisingly named CALL. The function returns to its caller when it encounters a RET/RETN
instruction. It takes an optional argument, but surprisingly its not the functions return value!
Well see what it is soon enough.
Passing arguments to a function
When you call a function, you first need to pass any required arguments to it. There are
several widely used pseudo-standard ways to do that, called calling conventions.
The earliest designed way to do that is to use the stack and let the caller clean it up
calculate how many bytes are needed for all the arguments, PUSH them all from right to left,
CALL the function, once it returns, ADD ESP, <needed size> to restore the stack. This allows
functions to take a varying number of parameters like Cs printf does. These functions end
with RETN and no argument. This is known as the cdecl calling convention and is used by
default in C functions.
Another common convention, stdcall, also pushes arguments from right to left, but this time
the callee is responsible for cleaning them up. This means the callee has to know how many
arguments it receives, and to remove them from the stack. This is done by the RET instruction
its argument tells the CPU to remove that many bytes off the stack before returning to the
caller. The RET instruction can only take a constant integer argument, meaning stdcall
functions cannot have varying numbers of arguments the way printf can. However, default
arguments are okay. This convention is used in the Windows API.
A variation of stdcall is called fastcall. It puts the first two DWORD-or-less sized arguments
into ECX and EDX respectively, and only the remaining ones into the stack. As a result, the
size of the RET argument should not take those arguments into account. This convention is
arguably faster than stdcall since it uses less stack, hence the name.
C++ non-static class functions (methods) use yet different conventions, and these vary from
compiler to compiler. They need a different convention to pass around the invisible this
pointer, pointing to the object whose method is called. MSVC puts this pointer into ECX and all
the other arguments into the stack, G++ pushes this pointer onto the stack after all the other
arguments. This convention is somewhat predictably labelled thiscall, but unlike other
conventions, it cannot be explicitly assigned to a function, and methods cannot be assigned a
different convention.
Note that while these are the most popular conventions, the compiler can use a different
convention (e.g. putting arguments into ESI, EDI, EAX and EDX instead of the stack) if the
function doesnt explicitly specify a convention and the compiler can prove it has seen all the
calls to this function (meaning the function is not exported to other compilation units and not
exported from a DLL). This is inconvenient when reversing, because it will require extra work
to determine where the arguments are. However, this is not overly difficult; to find which
registers are used to pass parameters, you just need to find registers that are read without
first being written to. Dont forget instructions that read/write data to hardcoded registers
without actually referencing them, e.g. MOVSD reads from ESI and writes to EDI, so barring
earlier writes to those registers you can assume they were used to pass arguments.

Expendable and non-expendable registers


Suppose a function is reading bytes from [ESI], modifying each byte in a separate function,
and writing the modified bytes to [EDI]. Suppose also that the modifier function uses ESI/EDI
as well in some way. What will happen once the modifier returns the first time? It shouldnt be
hard to see that the write will put the modified byte into a completely incorrect location,
causing all sorts of problems. To avoid this, functions are typically expected to leave all the
registers except EAX, ECX and EDX in the same state as they were before the function call.
This is normally performed by the called function PUSHing the register before using it and
POPing it back before returning. This also means that you cant expect EAX, ECX and EDX to
retain their values after calling a function.
Function result
If a function returns a primitive (not a struct/object) value that fits in a DWORD, its stored in
EAX before the function returns. If a function returns a QWORD (e.g. a double that the
compiler chose to return in normal registers instead of in the FPU stack), that value is stored
in EDX:EAX (meaning the higher DWORD is in EDX and the lower one is in EAX).
If a function returns a complex type, such as a struct, the compiler does additional magic to
manage memory for the allocated object. This technique will be covered later, along with the
other C/C++ constructs.
Function return address
Unlike the jumps, a function knows where it was called from, and after the RET/RETN is
executed, code after the function call will continue to execute. This is achieved by the CPU
silently PUSHing the address of the next instruction following the CALL as part of the CALL
instruction, POPing it off the stack as part of RET/RETN, followed by modifying ESP by as
many bytes as RET tells it to, and using the POPed value as the address to return to. This
means that messing up the stack can change where your function returns to, leading to nasty
bugs and security problems.
Function environment
A typical function stack looks like this:
Local variables
(Optional) Saved EBP
Saved return address
Leftmost stack
argument

Rightmost stack
argument
Callers local variables

Including saved non-expendable registers


Top of the stack right after CALL/before RET
Top of the stack right before CALL

Top of the stack after RET

Right after CALL, you can access the arguments by [ESP+04h], [ESP+08h] But as soon as
you do something to change the ESP, the offsets will change and become more difficult to
track. To avoid this, a so-called stack frame pointer is introduced. The current value of EBP
is PUSHed onto the stack, and EBP is assigned the current value of ESP. Then EBP is not
modified during the function and POPed off the stack as the function ends. This way,
arguments can be accessed by [EBP+08h], [EBP+0Ch] and so on. Note that the offset off EBP
is 4 bytes larger than the offset off ESP, because the EBP value was saved on the stack and
decreased ESP by 4 bytes.

If a function needs local variables, the space for them is typically allocated right after the
stack frame is set up by SUBtracting the wanted space size from ESP, and restored by
ADDing the same value before restoring EBP. As a bonus, local variables can also be accessed
through fixed offsets off EBP like stack arguments can, except this time the offsets are
negative.
If the functions local variables include double-precision numbers, the compiler will
additionally perform an AND ESP, -8 before allocating local variables. -8 has the same bit
pattern as (NOT 7), that is, the three lowest bits are zero. This basically reduces ESP to the
nearest smaller value that leaves no remainder after being divided by 8, which (after any
necessary reordering of the local variables) ensures the doubles will be placed at an address
aligned to 8 bytes. This is done because loading values from an unaligned memory address is
slow (and in other architectures even disallowed!). This alignment procedure is not done for
complex type variables (structs and objects).
Indirect function calls
CALL supports both direct function calls call the function that starts at <address relative to
current location> and indirect function calls <absolute address> is a pointer that points a
memory address, which points to the start of the function to call. The former method is done
when the compiler can determine what exact function we want to call at compile time and
knows where that function is. The latter is done when calling functions that are imported from
a separate DLL, using function pointers, or when the function to call cannot be determined at
compile time (for example, when using C++ virtual functions, where the actual called function
depends on the objects type at runtime).
Examples
Given a function definition:
int __cdecl delta(int a, int b)
Calls to it will compile to this (or similar) code:
PUSH b
PUSH a
CALL delta; EAX now contains the result
ADD ESP, 8
Given a function definition:
void __fastcall remove(void * structure, void * item)
Assuming structure and item are both declared as objects (not pointers), calls to it will
compile to:
LEA ECX, [structure]
LEA EDX, [item]
CALL remove; no stack modification
Assuming structure and item are declared as pointers, calls to it will compile to:
MOV ECX, structure
MOV EDX, item
CALL remove; no stack modification
Heres a complex example using function pointers:

typedef (int __stdcall *fptr)(int x1, int y1, int x2, int y2); function pointer type definition
int __stdcall Distance(int x1, int y1, int x2, int y2) { }; function declaration, lets assume this
functions code starts at 0x701050
int __fastcall callback(fptr function, int x1, int y1, int x2, int y2) {
int dist = (*f)(x1, y1, x2, y2);
return dist * 2;
}
int __fastcall callerFunction() {
return callback(&Distance, 3, 0, 0, 4);
}
; -- callerFunction -PUSH 4
PUSH 0
PUSH 0
MOV EDX, 3
MOV ECX, 0x701050
CALL callback
RET
; -- callback -PUSH EBP
MOV EBP, ESP
PUSH dword ptr [EBP+10h]
PUSH dword ptr [EBP+0Ch]
PUSH dword ptr [EBP+08h]
PUSH EDX
CALL [ECX]
SHL EAX, 1
MOV ESP, EBP
POP EBP
RET 0Ch

Higher-level constructs expressed in assembler


C constructs
Arrays
An array is a collection of values sharing the same type. The C implementation of an array is
geared towards maximum performance. As a result, this collection is stored in memory simply
as a sequence of values with no gaps each value takes up the same amount of bytes.
Therefore, (assuming that each element is X bytes wide) the first element of the array (index
0) is placed directly at the first byte of the collection, the second element is placed X bytes
after the first one, and so on. In general, the Nth element starts X*(N 1) bytes after the start
of the collection.
It should be noticed that this implementation requires you to know in advance how many
elements there are in the collection and you cant resize the collection at runtime. If you
require a variable-size array, you have to implement one yourself (or preferably use the C++
STL).
Previously, all array declarations were required to specify their size as a compile-time
constant value. The C99 standard relaxed this limitation.

Additionally, observe that the array size is not stored anywhere and must be managed
separately.
Strings
Strings are stored as character arrays (plain char[] for a single-byte encoding, more complex
types such as wchar_t[] for multi byte encodings).
Notice again that the length is not saved as part of the array. As a convention, a string has to
contain a null byte (0x00) after its last symbol, and this byte is considered the string end
marker most C string management functions will keep reading symbols until they encounter
it. So this byte has to be accounted for when calculating the buffer size.
Allocation
Normally, arrays follow the same allocation rules as other variables function local variables
are allocated on the stack, and other variables (global and static) are allocated on the heap.
Strings are handled slightly differently string literals, which are known at compile time, are
allocated on the read-only portion of the heap. The compiler may or may not choose to
compact multiple identical literals into one allocation. As a result, string literals cannot be
modified at runtime (but for historical reasons they can be pointed to by plain char *, which
would suggest otherwise).
Constants
Integer constants are usually substituted by their values when compiling, which makes it very
difficult to determine which uses of a specific value were actually constants in the source
code. By extension, it makes it difficult to replace said constant without source code.
Floating-point constants are either substituted by their values or preserved in the read-only
data segment.
Pointer math
Pointer math works differently between C and assembler:
int x[2] = {1, 2};
int *p = &x[0]; // p now points to the first element of x
++p; // p now points to the second element of x, that is, p was actually incremented by
sizeof(*p).
LEA EAX, [x] ; EAX now points to the first byte of x
ADD EAX, 1 ; EAX now points to the second byte of x, which is certainly not the second
element of x.
As you can see, in assembler, pointer math is no different from normal math, and operates on
bytes. In C, pointer math operates on elements that the pointer is declared as pointing to.
Structs
As a C struct is simply a collection of values, theres not much special handling needed. The
only tricky part is that structs can have packing, that is, small values (bytes, bools) followed
by larger ones (dwords, doubles) can have unused bytes (called padding) after them to make
those larger values align to their preferred byte boundary. Neither C nor C++ standards define
any specific packing indicators, so each compiler has its own ways to change struct packing
as necessary. However, there is no padding inserted between array members.

Conditional statements
Conditional statements are usually expressed slightly differently in ASM, because ASM is built
around if condition, then skip the following code logic rather than if condition, then execute
following code logic that C/C++ uses.
Labels
Like in most other languages, a simple string followed by a : is called a label, it doesnt
produce any actual code but serves as an anchor/goal post jump instructions can aim for.
If
If(condition) {
x += 5;
}
Becomes something like this in assembler:
MOV ECX, condition ; assuming condition is a Boolean value
MOV EAX, x;
TEST ECX, ECX
JZ after_if
ADD EAX, 5
after_if:
If-else
If(condition) {
x += 5;
} else {
x -= 3;
}
Becomes:
MOV ECX, condition ; assuming condition is a Boolean value
MOV EAX, x;
TEST ECX, ECX
JZ else_branch ; if ECX is zero, go to label
ADD EAX, 5
JMP after_if
else_branch:
SUB EAX, 3
after_if:
If-elseif-else
This is just a variation on the simple if-else logic.
If(condition > 5) {
x += 4;
} elseif(condition < 1) {
x += 7;
} else {
x -= 10;
}

MOV ECX, condition ; assuming condition is a signed integer value


MOV EAX, x
CMP ECX, 5
JNG elseif_branch; if ecx <= 5, go to next branch
ADD EAX, 4
JMP after_if; performed the branch, skip the other two
elseif_branch:
CMP ECX, 1
JNL else_branch ; if ecx >= 1, go to next branch
ADD EAX, 7
JMP after_if ; performed the branch, skip the next one
else_branch:
SUB EAX, 10
after_if:
Powerful disassemblers usually add visual spacing around jumps, as well as auto-generated
labels next to jump targets, to make it more readable.
Loops
While loop
While(x > 1) {
a += 2;
--x;
}
Turns into:
MOV ECX, x
MOV EAX, a
while_start:
CMP ECX, 1
JNA after_while ; assuming x is an unsigned int if its signed, this will be JNG instead
while_body:
ADD EAX, 2
DEC ECX
CMP ECX, 1
JA while_body ; again, assuming x is unsigned change to JG if signed
after_while:
Notice that there are two separate CMP instructions, before and after the body. The first one is
necessary to avoid entering the loop if x is <= 1 before the loop start. The second one could
have been avoided by using a JMP while_start; but that would result in a lot more jumps,
which is bad for CPUs performance.
Do-while loop
This is a simple variation of the while loop, and as such, its translated to a really similar code
sequence:
do {
a += 2;
--x;
} while(x > 1);

Turns into:
MOV ECX, x
MOV EAX, a
while_body:
ADD EAX, 2
DEC ECX
CMP ECX, 1
JA while_body ; assuming x is unsigned change to JG if signed
after_while:
All that is gone is the pre-loop condition check.
for loop
This loop is somewhat more terse, so its translation is a bit more complex:
for(int a = 0, x = 1; x < 10; ++x) {
a += 5 * x;
}
Becomes:
XOR EAX, EAX ; a = 0
MOV ECX, 1 ; x = 1
CMP ECX, 10 ; this check will not be made if the compiler is able to deduce its uselessness
JG after_loop ; x is signed, therefore JG
for_body:
LEA EDX, [ECX + 4 * ECX] ; tmp = 5 * x
ADD EAX, EDX
for_post_iteration: ; the loop body is finished, do the post-iteration action and comparison
INC ECX
CMP ECX, 10
JL for_body ; x is signed, therefore JL
after_loop:
As you can see, the structure of the transformation looks like this:
Int a = 0, x = 1;
if(x < 10) {
do {
a += x * 5;
++x;
} while (x < 10);
}
It looks awkward, so get used to it.
Switch statements
Switch statements are also tricky, and can be represented in several patterns, depending on
specific case values:

Tightly grouped cases with multiple different actions


If the switch represents multiple possible cases with values next to each other (e.g., a state
machine that has states 0 through 6, the switch controlling it would have seven cases with
these values, and presumably seven different chunks of code to perform in each of them), the
switch is represented as a simple jump table:
// this is not how I would represent a state machine and its states, but thats not relevant to
the example:
#define STATE_IDLE 0
#define STATE_STARTUP 1
#define STATE_LOOKING_FOR_WORK 2
#define STATE_WORKING 3
#define STATE_REPORTING_WORK 4
#define STATE_CLEANING_UP 5
#define STATE_SHUTDOWN 6
switch(State) {
case STATE_IDLE: /* code */ break;
case STATE_STARTUP: /* code*/ break;
case STATE_LOOKING_FOR_WORK: /* code*/ break;
case STATE_WORKING: /* code*/ break;
case STATE_REPORTING_WORK: /* code*/ break;
case STATE_CLEANING_UP: /* code*/ break;
case STATE_SHUTDOWN: /* code*/ break;
}
Becomes :
XOR EAX, EAX
MOV AL, State
CMP EAX, 6
JA after_switch; unsigned comparison will catch all the negative State values as well
MOV EAX, dword ptr[0x102030 + EAX*4] ; magic!
JMP EAX
switch_cases: ; lets pretend this is address 0x102030
dd 0x102050 ; this indicates a DWORD with value 0x102050
dd 0x102060
dd 0x102080
dd 0x1020A0
dd 0x1020C0
dd 0x102100
dd 0x102120
dd 0
; this array of DWORDs contains the addresses where each switch branch starts a
decompiler will usually turn these into readable labels
; this is address 0x102050
switch_case_0:
; code
JMP after_switch
; this is address 0x102060
switch_case_1:
; code
JMP after_switch
; more identical code skipped for readability

; this is address 0x102120


switch_case_7:
; code
after_switch:
As you can see, in this case the addresses to jump to are packed into an array indexed by the
case value. If the case values are packed in some other interval (e.g. 100 107) instead, the
code will first subtract 100 from the state before reading from jump table, so that the first
jumpable case has value 0.
Mind that the placement of the actual table is compiler-dependant GCC usually places it
right in the middle of the function, whereas MSVC prefers to place it after the function body.
Tightly grouped cases with few different actions
In some other cases, a switch is constructed so that a large number of cases share the same
code, i.e.
int a = 0, b = 0, c = 0; // convoluted example, just play along
switch(x) { /* pretend that this is preceded by code that disallows numbers outside the [0
24] range */
case 0:
case 3:
case 5:
case 7:
case 11:
case 17:
case 19:
case 23:
a = 1;
break;
case 1:
case 2:
case 4:
case 16:
case 18:
b = 3;
break;
default:
c = 8;
break;
}
This kind of switch employs a secondary indexing table:
XOR EBX, EBX ; a = 0
XOR ECX, ECX ; b = 0
XOR EDX, EDX ; c = 0
XOR EAX, EAX
MOV AL, x
CMP EAX, 24
JA after_switch : even though were claiming weve filtered out other cases, optimizing this
check away needs a smart compiler
MOV AL, byte ptr[0x102030 + EAX] ; magic part 1
MOV EAX, dword ptr[0x10204C + EAX * 4] ; magic part 2

; assume this is address 0x102030


db 0, 1, 1, 0, 1, 0, 2, 0, 2, 2, 2, 0, 2, 2, 2, 2, 1, 0, 1, 0, 2, 2, 2, 0, 2, 0, 0, 0
; this is address 0x10204C
db 0x102058
db 0x10205C
db 0x102060
; this is 0x102058
switch_case_0:
MOV BL, 1
JMP after_switch
; 0x10205C
MOV CL, 3
JMP after_switch
; 0x102060
MOV DL, 8
after_switch:
As you can see, the compiler built an additional indexing table mapping case value to code
branch. This saved some space (28 * 1 byte + 3 * 4 bytes < 25 * 4 bytes). Notice that the
indexing table is 28 bytes even though we only have 25 cases. The compiler added some
padding bytes to make the next table align nicely to a 4 byte address since were reading 4
bytes from that table, having it aligned on a 4 byte boundary is a performance gain.
Note: the code in the switch cases is indeed that short MOV in this case is 2 bytes, and a JMP
to a destination less than 127 bytes away is only 2 bytes as well.
Ungrouped cases
If a switchs cases have values that are not close to each other (e.g. if youre switching on a
crc32 hash of a string, which is not entirely unreasonable given lack of actual switching on
strings), the compiler cant do much to build these jump tables, as theres no indexing pattern
it can see. Instead, it will either build a binary search tree of possible cases, or simply resort
to an if-else chain.
Varargs
In C, certain functions like printf can accept a variable number of arguments (this is
mentioned in function calling conventions under cdecl).
On the C side, this is a really unsafe action, as then the compiler cannot do type checking for
these arguments or even assure that youve passed as many as youll need. (Certain
compilers provide non-standard ways to flag a function as the user of printf format string
model, and then arguments can be type-checked.)
int Sum(int FirstValue, ) { // this is a stupid example sum all the arguments passed in until
you encounter -1.
int sum = FirstValue;
va_list vargs;
va_start(vargs, FirstValue);
for(;;) {
int curValue = va_arg(vargs, int);
if(curValue == -1) {
break;
}
sum += curValue;

}
va_end(Arguments);
return sum;
}
This functionality is implemented through macros, and normally C programmers dont need to
understand how it works.
On the assembler side, this is not really different from accessing ordinary arguments.
Remember that in cdecl functions, arguments are all passed on the stack and take at least
four bytes.
PUSH EBP
MOV EBP, ESP
PUSH EBX
MOV ECX, dword ptr [EBP + 08h] ; take first argument
MOV EAX, ECX
LEA ECX, [EBP + 0Ch] ; address of second argument
OR EDX, 0FFFFFFFFh
loop_body:
MOV EBX, [ECX] ; read addressed argument
CMP EBX, EDX
JZ after_loop
ADD EAX, EBX
ADD ECX, 4 ; increment pointer to point to next argument
JMP loop_body
after_loop:
POP EBX ; this is not a scratch register, so it needs to be restored to its original state
MOV ESP, EBP
POP EBP
RET ; in cdecl, the caller is responsible for cleaning up the stack, so no argument
Notice that we calculated the sum directly in EAX, so we dont need to explicitly move it there
as a return value.
Additional C++ constructs
Constructors and destructors
C++ object construction is a bit complex it covers allocation of memory for the object(s) and
invocation of the appropriate initializer lists and constructor functions. Memory allocation is
complicated by:
1) the amount of objects - allocating a single object differs from allocating an array;
2) the programmer's choice of where to store the object:
a) if it's a local variable (on the stack), the memory space for it has already been allocated
and nothing needs to be done;
b) whereas if it is in heap memory (operator new), the appropriate memory allocation
routine has to be called;
c) It is possible for the programmer to allocate a memory block instead of asking operator
new to do so, this is known as placement new.

Similar issues arise with destructors if the object has a destructor, it has to be called, and if
the array holds such objects, the destructor has to be called for each of them.
Heap memory
Objects allocated on the heap (using operator new) begin by asking the memory allocator
routine to allocate a memory block of the appropriate size. If that fails, the reaction varies (C+
+ lets the programmer choose how to handle this problem). An exception can be thrown, or
NULL can be returned without calling the constructor.
If the allocation succeeds, the initializer list is invoked, followed by the appropriate
constructor. The constructor receives the pointer to the memory block as its this pointer and
compilers silently return it as the function result.
When allocating an array, and if the object being allocated has a destructor, things become
more complicated. The array can be of variable length, so something has to know how many
objects there are in the array and call the destructor for each of them. MSVC handles this by
requesting 4 more bytes than the array actually needs (which is array length * object size),
hiding the array length in the first 4 bytes, and returning the fifth byte of the allocated block
as the array start address. The compiler then runs the constructor in a loop, invoking it for
each object in order. After that is done, the array is ready for use. The destructor works in a
similar fashion it expects to find the array length in the four bytes immediately preceding
the array, runs the destructor that number of times (in reverse order! The last object in the
array is destroyed first, because it was constructed last), and returns the entire block to the
memory manager. This means that mixing up scalar and vector allocators/deallocators will
result in nasty crashing.
Independently allocated memory / placement new
A programmer can choose to manage his memory independently from the
constructors/destructor. In this case, placement new can be used to tell the compiler to just
construct the object at the given memory location. The object is constructed without any
memory management involved. The destruction is a bit trickier, since there's no built-in
placement delete you have to call the destructor manually, and then pass the pointer to the
memory manager for deallocation.
There is no standard way to manage arrays through this mechanism.
Local variable allocation
Local variables are typically allocated on the stack, unless the programmer uses the register
keyword to try and get them placed in a register, or the optimizer decides that would be a
good idea even without the programmer's hints. When they are allocated on the stack, the
memory necessary to store the object is allocated at function start automatically, and the
constructor doesn't have to do it. If the variable is an array, it is definitely of constant size, so
the compiler knows how much memory it needs to allocate, and how many times to call the
initializer list/constructor/destructor, so no extra memory is necessary to store that
information, and no tricks are needed in the destruction.
Note that if the function requires a large amount of stack space, the compiler inserts
additional code that walks the allocated area and pokes bytes every so often (in MSVC parlor,
this is __alloca_probe). This mechanism is necessary because of virtual memory memory
given to the program by the memory manager is initialized in pages on first use, and after
each initialized page a guard page is placed. Accessing this guard page results in an
exception which the memory manager silently swallows and initializes the page. If your stack

space takes more than 2 pages, it is possible youll be accessing the area beyond the guard
page, which will result in a different exception that the memory manager doesnt handle. The
poking process initializes the allocated area to avoid this. If you manually modify the stack
pointer to get scratch space this way, you should also perform a similar poking routine.
Enumerators
Enums are handled identically to simple numeric constants their occurrences in the source
are replaced by their literal values.
Methods
A method is simply a member function of a class. That is, it has a this pointer pointing to
the object itself. The way this pointer is passed to the function depends on the compiler (see
function declarations on thiscall). Unless the function is virtual, it looks quite like an ordinary
function whose argument happens to be a class pointer.
Inheritance
Single inheritance is straightforward the data members of the parent class are placed in the
beginning of the class, the data members of the child class are placed after them. The actual
layout of the parent data members is the exact same as it was in the parent class alone.
(Otherwise you couldnt pass a derived class pointer to a function expecting a base class one.)
Multiple inheritance is somewhat more complex, each parent class is laid out in the same way,
but the actual ordering of the parent classes is undefined the compiler is free to choose the
more optimal ordering.
Casting a multiply-inheriting class to one of its parents is a simple enough task for the
compiler, it just has to adjust the this pointer by the appropriate amount of bytes. Suppose
we have a class Base1 that takes 20 bytes, Base2 that takes 40, Base3 that takes 12, and
Derived that inherits from all three of them. If the compiler has chosen to keep the class
ordering as Base1 Base2 Base3, casting Derived* to Base3* requires an addition of 60 bytes.
If, on the other hand, the compiler had chosen the ordering of Base2 Base3 Base1, the same
cast would require only 40 bytes.
Multiple inheritance complicates method calls as well the compiler has to quietly cast the
this pointer to the class that the method expects.
Virtual inheritance
Virtual functions
Virtual functions are an important part of inheritance. Compilers typically combine all of the
classs virtual functions in declaration order, and put all of their addresses into an array. This
array is called the virtual function table, or vftable/vtable for short. The address of this array is
then (normally) inserted as the first invisible data member of the class. Child classes replace
those pointers with their own during construction and restore the base pointers during
destruction.
Calling a virtual function usually looks like this (in MSVC):
MOV ECX, ESI ; assuming ESI contains the object whose function were calling
MOV EDX, [ECX] ; EDX contains the pointer to the vftable
CALL [EDX+24h] ; call the function pointed to by the 9 th pointer in the table
Remember that in MSVC, this pointer is passed in ECX.

A class inheriting from multiple classes with their own virtual functions naturally has a
separate pointer for each classs vtable. But the functions of the second and later virtual table
have the same problem as simple methods of multiply-inheriting class. Suppose we have a
base class Base1 that has virtual functions and no data members (this is called an interface),
a base class Base2 that has both virtual functions and data members, and a derived class
Derived that inherits from both Base1 and Base2. Then calling a virtual function of Base2
looks like this:
MOV ECX, ESI
LEA ECX, [ECX + 4] ; load this pointer of the Base2 portion
MOV EDX, [ECX] ; address of vtable
CALL [EDX + 14h] ; call the 5th function in the vtable
Notice that the this pointer points to the Base2 portion. The actual implementations of the
functions in the Deriveds Base2 vtable can reference Deriveds own data members but the
member offsets will be different from those youd see when working with Deriveds non-virtual
functions.
In addition, Deriveds implementations of Base2 virtual functions can call Base1s functions
in which case the this pointer has to be adjusted again:
MOV ECX, ESI ; assume that were in a Base2 virtual function
LEA ECX, [ECX 4] ; reduce the pointer by four bytes (sizeof Base1)
MOV EDX, [ECX] ; load vtable
CALL [EDX + 10h] ; call the 4th function in Base1s vtable
This construct is relatively simple, but like all C++ additions it is usually beyond the
disassemblers range of comprehension. So it resorts to showing you a confusing pointer math
expression similar to this:
v48 = (DWORD *)(((char *)v3) 4);
v49 = *v48;
v49[4](v48);
You can assign a type to v48 to make the last two lines nicer and more readable, but the
pointer math is quite impossible to get rid of.
Notice that even if you define just one implementation for a virtual function, any derived class
that inherits from this base class and some additional classes potentially requires adjustments
of the this pointer. This is impossible to place into call sites, so adjustments are needed in
the functions themselves. This adjustment is known as a thunk. At least in MSVC, this is
implemented as follows: the derived classs vftable contains a pointer to this small function:
MOV ECX, ESI
LEA ECX, [ECX + 4] ; load the this pointer of the appropriate base class in this case the
appropriate base class starts at 0x4 bytes.
JMP BaseFunctionImplementation; the linker places the starting address of the base classs
implementation of this particular function here
The JMP means the RET in the parents implementation will return to the place where this
thunk was called from. This is easier than replicating all the argument layout, CALL ing it, and
RET ing in each thunk. Since nothing is done in the thunk itself, it usually doesnt create its
own stack frame.

COM
A COM interface is really nothing more than a simple class that has virtual functions (declared
abstract the inheriting class has to implement them manually) and no normal methods or
data members. Implementing an interface only adds one vftable pointer to the class.
Templates
Unlike actual C++ templates, their assembler representations are usually straightforward
each template specialization that is actually used in the code is created as a separate
standalone class/function with specific types and whatnot. The only hint of the function/class
being a template is usually the associated RTTI data.

Appendix A: Syringe
Syringe is a DLL Injection tool developed by us (mostly pd) to insert custom-written DLLs into
the target process and hook their code at specific locations. This enables us to write custom
code for the game in C++ instead of raw ASM. Its open source, and for the curious, heres the
breakdown of how it works.
-

When Syringe is launched, the target executable filename has to be given to it


on the command line. Any additional command line parameters will be passed to
the target executable unchanged.

Once the executable is found, Syringe attempts to parse its import table looking
for LoadLibrary and GetProcAddress. If they are not found, target will not be
launched.

Syringe scans the target executables directory for pairs of .dll and .dll.inj files.
An .inj file contains information about the addresses where which hooks should
be placed.

Syringe launches the target executable as a suspended debuggee process.

When the process is launched, Syringe injects a small piece of code that tells the
process to LoadLibrary each injectable DLL and GetProcAddress each hook
function.

When Syringe knows all hook function addresses, it allocates a small memory
area for each hook placement and rewrites the code at the hook location with a
jump to this memory (each hook can specify how many bytes of code should be
backed up, these bytes are copied to the end of the buffer). In this buffer is a call
to the hook function and analysis of its return.
o

Hook placement considerations

It is unwise to back up bytes containing jumps, because jump


offsets are relative and as such will be pointing to very incorrect
locations when evaluated in the memory buffer.

Because an absolute jump takes five bytes, each hook will overwrite
at least that many bytes of original code. Make sure you are not
overwriting any bytes of a different code branch (there should be no
way to jump into the middle of the bytes being overwritten). That
includes if/else, switch cases, and function epilogues if you are
hooking up near the end of the function, be sure that your hook
wont hit the next function.

Hooks should not be placed in the space between a comparison and


the conditional jump that uses its result. Even though Syringe saves
and restores the EFLAGS, the epilogue of a hook performs a
conditional jump to return to original code, thus ruining the original
comparison's results.

Before calling the hook, Syringe executes PUSHAD; PUSHFD; PUSH (hook
address); PUSH ESP;

Hook functions are normally defined as DWORD __cdecl func(REGISTERS *


R), where REGISTERS is a structure containing all the values that have
been pushed onto the stack. Passing this as a pointer allows us to both
read and write values to the registers.

After calling the hook, Syringe inspects the return value.

If this is zero, the execution will continue at the original bytes that
have been backed up and will jump to the first byte that hasnt
been backed up afterwards, unless the original bytes contain a
control flow instruction.

If its not zero, the execution will treat the return value as an
absolute address to jump to.

In both cases, all registers and flags will be restored from the stack
before continuing.

After all hook functions are hooked up, Syringe lets the target process run
normally but keeps acting like a debugger, trapping exceptions and terminating
the process if necessary.

After the process exits on its own, Syringe closes as well.

Limitations of the current version include:


-

No support for packed executables. This was not necessary for the game, and as
such is not considered a priority.

No way to have two executables with different hooks in the same directory
Syringe will hook up all .inj files to the launched process.

No way to debug a process yourself Syringe acts like a debugger. This will
hopefully be solved in the future by making Syringe detach from the process
after all hooks are set up.

By default, Windows makes debuggee processes use a special, checked, heap.


Memory allocations on this heap are filled with specific markers when allocated
and freed, which helps during debugging because you can see the markers and
determine that youre using uninitialized/already freed memory, but it carries a
performance penalty. In the future this will be changed a new version of the
debugging API lets the debugger decide if this heap should be used.

No built-in disassembler you have to manually calculate how many bytes


should be backed up in each hook. This is not considered a big problem because
you need to be looking at the disassembled code to determine where to place
the hook anyway, and counting up to five bytes should not be difficult.

Appendix B: Security considerations


C/C++ are designed to give maximum performance in exchange for sanity checks if you
attempt to access the 15th element of a 3 element array, C and C++ will happily let you do
that. This attitude is acceptable for high-performance systems, but it makes programmer
errors/oversights more likely. As a result, there are huge amounts of code written in C/C++
containing subtle problems which are likely to be exploited by malicious individuals. The game
were focusing on is no exception.
For more information on security issues, check out Writing Secure Code by Michael Howard
and David LeBlanc, Find the Bug A book of incorrect programs by Adam Barr, The
Shellcoder's handbook: Discovering and Exploiting Security Holes by Jack Koziol, and any
books by Kris Kaspersky.
String overflows
C strings are terminated by the NULL byte (0x00). Most string functions will keep working until
they find this byte regardless of the size of the buffer where the string is stored. If the code
copies a 20-character string into a buffer that only has room for 16 characters, plain strcpy
will happily copy all 20 characters plus a terminator NULL, overwriting other data in the
process. If the code is careful to not overflow the buffer but forgets the terminator NULL, the
next attempt to read that buffer will read beyond its limits until it finds a NULL. Sprintf can
also overflow its destination buffer (and do more nasty stuff well see later).
If said buffer is allocated on the stack, not length-checking the data written to it makes it
possible to strcpy enough bytes to overwrite the return address, which means once the
current function is done and returns, the code execution will continue at an arbitrary address
instead of where you expect it to which means the attacker can cause your program to
execute arbitrary code.
For example, this code attempts to determine whether the given file exists:
static bool Movie::Exists(const char * basename) {
char fullname[16]; // movie basenames have to be 8.3 to be loaded by our file API
strcpy(fullname, basename);
strcat(fullname, .VQA); // add extension
return Filesystem::FileExists(fullname);
}
Observe that the comment states the expected format of the filename, which should fit in the
16 character buffer no problem. But this expectation is not validated by the code. In the
game, the movie filenames are read from an .ini file, which means a malicious user can enter
a filename thats longer than 8 characters
You need a separate way to add your own code, but its pretty much guaranteed that a
random 16 character long filename will cause the program to crash as soon as it attempts to
return after checking for the existence of that file.
Extra problems with sprintf
One of the format tokens sprintf accepts is %n. All other tokens represent values that will be
read from the function arguments and written into the resulting string, while %n will treat the
corresponding function argument as an int* and force a value (the number of symbols
generated in the output string so far) to be written into said argument. Given the right
circumstances, this can be abused to overwrite small pieces of memory in arbitrary locations.

Integer overflows
Mathematical operations on numbers are subject to limits of values that can be represented
by the variables' types. As was mentioned very early, signed and unsigned integers typically
take up 32 bits in a 32-bit environment. Which means that a signed integer can represent
values between -231 and 231-1 inclusive, while an unsigned integer can represent values
between 0 and 232-1 inclusive. If the programmer is not careful when performing math,
unexpected results might happen (which is generally true for doing anything carelessly, come
to think of it...).
For example, processing file formats a file that contains an image typically starts with a
header that specifies the image's width and height. If the file format is not compressed (e.g.
BMP or TGA), the code can try to allocate a buffer to store the pixels like this:
byte * allocateBuffer(int width, int height, int bitDepth) {
int size = width * height;
size *= (bitDepth >> 3);
return new byte[size];
}
multiplying width by height by bytes-per-pixel. However, if the file header says it is very large
(e.g. a 24-bit BMP with width=0x10001, height=0x10001), the needed buffer size should be
0x10`001 * 0x10`001 * 3 = 0x100`020`001 * 3 = 0x300`060`003, but that number exceeds
32 bits... so the signed integer representing size will be left with a value of 0x60`003 bytes. If
the code then tries to copy the file contents into this buffer, it will start overwriting other data
outside this buffer after a while.
Of course using signed integers for dimensions is fairly stupid because they cannot be
negative, but using unsigned integers doesn't solve the problem of integer overflow. In fact, C
doesn't offer any way to detect that such an overflow has happened. The most reasonable
thing to do, in my opinion, would be to use unsigned integers for dimensions, and a longer
unsigned integer for the size. If the dimensions cannot exceed 32 bits, then their product
cannot exceed 64 bits using an unsigned 64 bit integer to store the size would be sufficient
to contain the width*height, which should be sanity-checked to not exceed 64 bits when
multiplied by the bytes-per-pixel value (this can be tested by using a constant equal to
0xFFFFFFFF`FFFFFFFF divided by the bytes-per-pixel, and testing that the w*h is not larger
than this value) . Of course other sanity checks for image dimensions should exist to refuse to
load images exceeding reasonable dimensions long before these bit size limitations come into
play.
Promoting values is also a possible source of problems. If you store the length of a string in a
short buffer (16 bits), you might be tempted to just pass this value to a memory allocator to
get a new buffer of the same length. But even though operator new[] takes an unsigned
(size_t) argument, the short is extended to int beforehand. Depending on the compiler, short
is likely to be signed by default, which means for any string longer than 0x7FFFF chars (that
is, 8 kb or more), it will be sign-extended and the memory allocator will receive an argument
that exceeds 0x7FFFFFFF (2 GB).
Another likely problem is comparing signed and unsigned values while modern compilers
can catch this, it's still a problem, especially when coupled with the previously mentioned
integer promotions. The coder might try to limit the maximum value of this buffer, but the
nave

if(len > 4096) {


len = 4096;
}
approach will not catch the case where len exceeds 0x7FFF, because that would make len
negative and thus less than 4096...
Use of uninitialized values
If you create a variable without an initial value, there is no guarantee on what value it will
have. Sounds fairly obvious when put that way, but it's surprising how often this is overlooked.
Fortunately, modern compilers and static analysis tools can spot this problem in most cases.
If the variable is allocated on the stack, it will hold the previous value that was held in the
same stack slot. If the variable is allocated on the heap, certain heap allocators (especially
when the program is compiled in debug mode) will fill it with known magic numbers to help
you detect uninitialized values, and/or provide padding around the requested buffer, to soften
buffer overflows.
Most often (but not always), usage of this leftover value indicates programmer error.
Worth mentioning is the case of Debian/Ubuntu's SSH key generator bug: the code was relying
on uninitialized values as an additional source of entropy for seeding the random number
generator (this is a poor source of entropy, but a fairly original use of uninitialized values in
my opinion). When the code was run under Valgrind, this was detected as an error, and the
maintainers commented out the uninitialized value... too eagerly, along with the other source
of entropy, leaving the random number generator very poorly seeded.
Going back the topic of file formats, just because the header says the image is 0x1000 pixels
by 0x10000 pixels, it doesn't mean the image really is. Each time something is read from the
file, checks should be made if the number of bytes actually read matches the expected value.
If it doesn't, you will find uninitialized/incorrect values in the destination buffer. (See the code
example in the previous section it's not a bad idea to memset the buffer to all zeroes before
returning it.)

Appendix C: analysing gamemd.exe


Reading the program code in IDA Pro
IDA Pro can either open an executable, or its own disassembly database. This database is
usually created for every disassembled file, and can be opened without the actual executable.
In our case, youll be opening the database instead of the executable we have already
added huge amounts of information to it beyond what auto analysis can tell you.
After opening the database, you can do several things:
-

Hit Ctrl-P to get a list of all the defined functions;

Hit Ctrl-F12 to get a list of string literals;

Hit Ctrl-F1 to see a list of all the custom data structures that have been defined
(note that these are C equivalents of the games original C++ structures,
massaged into format that IDA can understand do not blindly copy these to
YR++ or assume these precisely represent Westwoods original data structures);

Hit Alt-I to search for an integer constant;

Hit Ctrl-M to get a list of all the locations that weve marked as important for
some reason maybe we need to refer to them often, maybe they represent
bugs that we will want to fix someday, maybe they are just exceptionally
interesting or dumb.

Reading the string literals can tell you a lot of interesting things, including all the flags that
are read from the INI files (which does not always indicate the flag value is actually used).
Inspecting the functions can explain how the game performs certain functionality, though this
is going to be a tricky undertaking given the size of the game. You can try using Hex-Rays (F5)
to create a C representation of the function, but this is going to be quite different from what
Westwood originally wrote do not cite hex-rays restored code as an example of Westwood
stupidity.
There is no real recipe for how to understand the games behaviour, so youll just have to read
the code and follow any function calls it performs to see what actually is done where and
when. This is worsened by the fact that the original code wasnt exceptionally well thought
out.
How-tos
If you want to find out when the flag Bunkerable= is used by the game, you can find
Bunkerable= in the string literals list and check the references to it this will show you the
function where all the flags are read from the INI and stored in the respective objects. In the
general case, youll see a call to INIClass_GetBool(pINI, pSection, Bunkerable), and the result
of this function will be placed into the respectable object at a specific offset. Later on, when
that flag is checked, the game has to read from that specific offset. Once you know what
offset that flag is stored in, you can use Alt-I (Search for Immediate) to search for all uses of
that numeric value (be sure to check Find all occurrences). This will produce a large list
including a lot of unrelated lines, because the search simply finds where that value is used as
an instruction operand, regardless of whether its mov eax, [esp+18h] , call [ebx+18h] or add
ecx, 18h. But since you know how data structure members are likely to be accessed (mov
edx, [esi+DFEh]), you can look over all the results of that form. If other data structures do not
have a member at the same offset, or you can determine when the instruction is referring to
the right data structure, youll be able to see where that flag is read and what is done as a
result.
When your cursor is on a named item (function name, static object, etc.) you can hit X to get
a list of all known references to that item in the executable. You can hit Ctrl-X to get a list of
all known references to the current address. Mind that if the named item's name can be
interpreted as an integer value, IDA will assume it is an address rather than a name hitting X
while the cursor is over labels dd or db will cause this confusion.
When your cursor is on a stack variable, you can use K to toggle between a named variable
and simple numeric stack offset. You can also hit Enter when its a named variable, to get a
new window containing the functions stack frame here you can modify local variables. If the
function uses any structures on the stack, you can use Alt-Q to place a structure at the right
offset.
When your cursor is on an expression like [ebx+48h], you can use Ctrl-T to select a structure
and display its member access instead of 48h, e.g. you can see expressions like
[ebx+ObjectClass.IsFalling] , which make it a lot more readable. If you select several lines
before hitting Ctrl-T, all accesses to the register your cursor was pointing at will be turned into

that structures offsets. This is convenient when youre analysing a large function in that
case, usually the ESI register is dedicated to storing the this pointer, and not modified
throughout the function. You can drag-select the code from the function start to the last
access to ESI, making sure the cursor stays on ESI, and hit Ctrl-T, then select the appropriate
structure. All the references to ESI will display that structures members at the given offsets.
We have defined class vftables as structures as well, so this works even for virtual function
calls. Note that right clicking the expression will offer a very similar Structure Offset submenu
with the structures and members listed.

Appendix D: reading crash dumps from the game


In case of a crash, the original game simply dumped a lot of stack and register contents and
aborted. Ares changed this to optionally create a full memory dump before terminating. This
dump, while comparatively large (100-250 MB), makes it a lot easier to troubleshoot problems
than the old stack dump did.
The memory dump can be loaded into a debugger like WinDBG. The symbols for the Ares DLL
can be taken from SVN along with the DLL itself and the sources, just make sure they're for
the right revision. Typing .ecxr into WinDBG's command line will display the exception that
caused the crash. Based on the EIP, this can be divided into three areas gamemd, Ares and
external code.
If the crash happened in gamemd, it's usually a trivial matter of something not being loaded
or set to a valid value the game makes a lot of assumptions about certain flag combinations
and must-exist files without actually checking them. In this case, it is usually fairly easy to
determine what data it was trying to access and why.
If the crash happened in Ares and you have the symbols, it should again be simple to figure
out WinDBG will even display the source code of the offending area. Chances are, the code
is either using an expired pointer, or doing something illogical.
If the crash happened in external code, it can be trickier to figure out.
In all three cases, one of the most useful pieces of information is the call stack who called
the function that called the function that crashed? Who called that function? While WinDBG
attempts to resolve this automatically, it is not always successful. In which case you'll need to
do it manually. This should be fairly easy once you open a Memory View (Alt-5) and display
the contents at ESP in pointer and symbol mode. The values that can be interpreted as
pointers should be inspected to see which of them point to code (as opposed to data). If the
EIP is in gamemd or Ares, you can use IDA to determine the local stack pointer value (how
much space after ESP is used by the local variables at that point) after this local space you
will find the return address, where the function will jump to after completing its job. In other
words, the address right after the instruction that called this function. Once there, you can
once again use IDA to determine the local stack size, to find the caller... and again and again,
until you get a good idea of who called what.
Note that there might be other values in the stack that look like pointers into the code, but if
their position in the stack doesn't fit the position of the return address, it can mean one of
several things:

the code is passing function pointers around as callbacks this usually means the
pointer points to the beginning of a function instead of the middle,

IDA has miscalculated the local stack pointer - it happens fairly often with virtual
functions (in which case you need to manually fix it in the IDB for future reference and
recheck the stack),

those addresses are leftovers from previous stack uses (uninitialized values, as far as
this call stack is concerned), in which case they can be ignored. One common case of
seeing these uninitialized values is fixed size arrays that track their used size
separately: if the array is 10 elements wide, and only the first three elements are used
at the point it crashed, the latter 7 elements can legitimately be uninitialized and
contain leftover pointers.

WinDBG's Memory View can also be used to inspect objects at byte level, and with symbols
for Ares loaded it can even display the objects as actual collections of properties, making
analysis even easier. That means, if the crash happened because the object is misconfigured
(often the cause of crashes in gamemd code), there will likely be a pointer to this object in the
stack near the crash site, or in one of the registers. It can be used to determine the faulty
object's ID, which can be used to analyse the object's INI code for problems.

Вам также может понравиться