Вы находитесь на странице: 1из 7

10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint

Manual reverse engineering of WebAssembly: static code


analysis
forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis

October 11, 2018

To be able to manually analyze the Wasm text format, we’ll need to learn some more theory
first. Our previous blog post described how memory and data are handled. Building upon that
foundation, we will introduce some additional concepts useful for reverse engineering Wasm,
and then apply the newly gained knowledge to analyze a Wasm sample.

Note: This post is part of a series. The last post in the series introduced Wasm memory
handling, so if you missed that one you may want to read it before proceeding further.

The Wasm instruction set


As we discussed earlier in the year, Wasm can’t by itself contact the outside world. All
communication with the outside environment needs to go through JavaScript API calls. With this
in mind, let’s for now concentrate on instructions that are useful for actually calculating
something within Wasm, not calling JavaScript.

In comparison to the instruction set of x86 or x64, the instruction set of Wasm is very small. We
have a few different groups of functions:

Arithmetic instructions
Control flow instructions
Memory access instructions
Comparison instructions
Conversion instructions

Below are a few examples of common Wasm instructions. For a more comprehensive list of
instructions, see the reference manual.

get_local Gets the value of a variable in local storage and makes it available to
<variable> subsequent instructions by pushing it to the stack.

set_local Sets the value of a variable in local storage by popping the value from
<variable> the stack and assigning the popped value to the local variable in
question.

get_global Gets the value of a global variable.


<variable>

set_global Sets the value of a global variable.


<variable>

i32.add Pops two numbers from the stack, adds them and pushes the result to
the stack.

https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 1/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint

call <func Direct call to the specified function number.


nbr>    

call_indirect Call function whose number is resolved at runtime.


<var>

if/else/end Conditional branch.

br Unconditional branch.

br_if Conditional branch.

loop Define a block of code to loop over.

block Define a block of code.

return Return from function.

Understanding the text format


The examples given above are of instructions in the WebAssembly text format. As Wasm is a
binary format, those textual instructions will be represented as bytecode in the compiled file.

Let’s analyze a simple function:

(func $max (; 0 ;) (param $0 i32) (param $1 i32) (result i32)


(select
(get_local $0)
(get_local $1)
(i32.gt_s
(get_local $0)
(get_local $1)
)
)
)

The first row means that we have a function named ‘max’ that takes two integers as parameters,
$0 and $1, and returns one integer, result i32.

The ‘select’ instruction takes three arguments: first operand (get_local $0), second operand
(get_local $1) and a condition argument (in this case the i32.gt_s instruction and its associated
operands). ‘Select’ returns the first operand if the condition operand is non-zero, otherwise it will
return the second.

Within the select, we have the instruction ‘i32.gt_s’, which checks if the first argument (get_local
$0) is greater than the second argument (get_local $1). The result of this check will be the
condition operand of the ‘select’ operator.

So, if the first argument is greater than the second argument, the first argument is returned, else
the second argument is returned.
https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 2/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint

Note that different tools may represent the textual Wasm format slightly differently (just like
different disassemblers will). For example, the above may also be represented like this:

(func (;0;) (type 0) (param i32 i32) (result i32)


get_local 0
get_local 1
get_local 0
get_local 1
i32.gt_s
select)

Whatever textual representation we use, it corresponds to the following high-level code:

int max(int a, int b) {


return a > b ? a : b;
}

More information about the Wasm text format can be found here.

Static code analysis with wasm2wat


Now that we have an understanding of the concepts of stack machine, local variables, global
memory, data storage (as covered in our previous post), instruction set and the Wasm text
format, let’s use the acquired knowledge for a deeper analysis of the same Wasm sample as in
our previous blog post.

Picking up where we left off last time: both the initial shallow analysis and behavioral analysis
indicated that we’re dealing with a sorting algorithm. For the sample in question, you may well
be done at this point, depending on how much time you can sacrifice on one sample.

What if we were dealing with a sample whose function isn’t so obvious though? You may often
need to look at the source code, so let’s show how we could go about doing that.

To approach the code, we’ll once more turn to the wasm2wat tool that we used in our previous
blog post. We already found out that the sort function is function number 1. Below is the first part
of this function in Wasm’s textual representation:

$ ./wasm2wat quicksort.wasm
[snip]
(func (;1;) (type 1) (param i32 i32) (result i32)
(local i32)

It starts with the function definition, showing that the function takes two integers and returns one
integer. Then we have a local variable definition, local i32. Expressed in high-level pseudo code,
we have:

func sort(int param1, int param2) {


int var1;

The Wasm code continues:

https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 3/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint

get_local 0
get_local 1
i32.ge_s
if ;; label = @1
get_local 1
return
end

The first two instructions, get_local 0/1, will get the values of the first and second function
parameter respectively, and push them to the stack. Then the third instruction, i32.ge_s, will
operate on these two values on the stack, by implicitly popping them and then testing whether
the first value is greater or equal to the second value. The result of the comparison will be
pushed to the stack. The subsequent if-statement will be true if the value at the top of the stack
is a non-zero value. In other words, the branching at the if-statement will depend on the three
previous instructions.

As mentioned earlier, the same code can be represented in different ways. If the above if-
statement feels hard to digest, here is an alternative representation:

(if
(i32.ge_s
(get_local $var$0)
(get_local $var$1)
)

What we have reversed so far can be expressed in high-level pseudo code like this:

func sort(int param1, int param2) {


int var1;
if (param1 >= param2)
return param2;

Continuing looking at the Wasm text format, using wasm2wat, for the sort function we have:

get_local 0
get_local 1
i32.add
i32.const 4
i32.div_s
i32.const 2
i32.div_s
i32.const 4
i32.mul

This snippet of code does math calculations. Once again, the get_local 0/1 instructions get the
two parameter values that were passed to the function, and the subsequent ‘i32.add’ instruction
will operate on those two values, adding them together and putting the resulting number on the
stack.

After that we have the instruction ‘i32.const 4’, which will push the value 4 to the stack. The
subsequent instruction, i32.div_s divides the value next to the top of the stack with the value at
the very top. In other words, the previously added values, param1 + param2, will be divided by
4. Immediately after this we have the same pattern again, but this time a division by 2. Similarly,
the following two instructions involve the instruction i32.mul operating on the constant value 4.

The net result is that the value obtained so far is multiplied with 4. Expressed more concisely,
the code performs the following calculation: (param1 + param2) / 4 / 2 * 4
https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 4/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint

Let’s look at the subsequent Wasm code, with our comments added on the right:

Instruction Description

set_local Local variable var1 = whatever is on top of the stack, which we know is:
2 (param1 + param2) / 4 / 2 * 4

get_local Prepare for function call by putting param1 to the stack.


0

get_local Prepare for function call by putting param2 to the stack.


1

get_local Prepare for function call by putting var1 to the stack.


2

i32.load Pop the value at the top of the stack and use it as a pointer into global
memory, then fetch the data it points to and push that data to the top of
the stack.

call 0 Call function 0 (named ‘partition’), using the parameters that were just
set.

set_local Local variable var1 = return value of the function call.


2

Once again, let’s express this in high-level pseudo code:

var1 = (param1 + param2) / 4 / 2 * 4;


var1 = call partition(*var1,param2,param1);

If this feels hard to grasp, try running the example in Chrome, single-stepping and watching in
the debugger (i.e. DevTools) how the stack and local variables change values, and what the
global memory looks like. An earlier blog post on Wasm analysis describes how to do that. By
the time we are at the ‘call 0’ instruction, before it has been executed, the local variables and
stack would look like this:

https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 5/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint

Figure 1: Debugging in Chrome Developer Tools

The Wasm code continues:

Instruction Description

get_local 0 Put param1 on the stack.

get_local 2 Put var1 on the stack.

i32.const 4 Put 4 on the stack.

i32.sub Subtract 4 from var1 (on the stack only, not in local variable memory).

call 1 Call sort.

drop Discard return value from the call to sort.

get_local 2 Put var1 on the stack.

i32.const 4 Put 4 on the stack.

i32.add Add 4 to var1 (on the stack only, not in local variable memory).

get_local 1 Put param2 to the stack.

call 1 Call sort.

drop Discard return value from the call to sort.

get_local Return var1 (the last remaining item on the stack will be the return
2) value).

https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 6/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint

In high-level pseudo code this would correspond to:

call sort(var1 - 4, param1);


call sort(param2, var1 + 4)
return var1;

Altogether, our final result is:

func sort(int param1, int param2) {


int var1;
if (param1 >= param2)
return param2;
var1 = (param1 + param2) / 4 / 2 * 4;
var1 = call partition(*var1,param2,param1);
call sort(var1 - 4, param1);
call sort(param2, var1 + 4)
return var1;
}

We find that this is indeed a QuickSort implementation. If you wish, you can further verify this by
comparing the above pseudo code to some existing implementation you find on the Internet.
Reversing of the partition function is left as an exercise to the reader.

Conclusion
We have now successfully reverse engineered a complete Wasm function. First, we turned the
Wasm binary format into its textual format using the wasm2wat utility, and by analyzing the text
format representation we were able to create high-level pseudo code of the algorithm.

Tools exist for doing automatic decompilation, which is a more efficient way to do reverse
engineering than how we did it today. While automatic decompilation can save time, it’s often
imperfect and having an understanding of manual analysis allows us to work around these
imperfections.

References
https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format
https://github.com/sunfishcode/wasm-reference-
manual/blob/master/WebAssembly.md#instructions
https://www.pnfsoftware.com/reversing-wasm.pdf
https://sophos.files.wordpress.com/2018/08/sophos-understanding-web-assembly.pdf

https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 7/7

Вам также может понравиться