Академический Документы
Профессиональный Документы
Культура Документы
To be able to manually analyze the Wasm text format, we’ll need to learn some more theory
first. Our previous blog post described how memory and data are handled. Building upon that
foundation, we will introduce some additional concepts useful for reverse engineering Wasm,
and then apply the newly gained knowledge to analyze a Wasm sample.
Note: This post is part of a series. The last post in the series introduced Wasm memory
handling, so if you missed that one you may want to read it before proceeding further.
In comparison to the instruction set of x86 or x64, the instruction set of Wasm is very small. We
have a few different groups of functions:
Arithmetic instructions
Control flow instructions
Memory access instructions
Comparison instructions
Conversion instructions
Below are a few examples of common Wasm instructions. For a more comprehensive list of
instructions, see the reference manual.
get_local Gets the value of a variable in local storage and makes it available to
<variable> subsequent instructions by pushing it to the stack.
set_local Sets the value of a variable in local storage by popping the value from
<variable> the stack and assigning the popped value to the local variable in
question.
i32.add Pops two numbers from the stack, adds them and pushes the result to
the stack.
https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 1/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint
br Unconditional branch.
The first row means that we have a function named ‘max’ that takes two integers as parameters,
$0 and $1, and returns one integer, result i32.
The ‘select’ instruction takes three arguments: first operand (get_local $0), second operand
(get_local $1) and a condition argument (in this case the i32.gt_s instruction and its associated
operands). ‘Select’ returns the first operand if the condition operand is non-zero, otherwise it will
return the second.
Within the select, we have the instruction ‘i32.gt_s’, which checks if the first argument (get_local
$0) is greater than the second argument (get_local $1). The result of this check will be the
condition operand of the ‘select’ operator.
So, if the first argument is greater than the second argument, the first argument is returned, else
the second argument is returned.
https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 2/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint
Note that different tools may represent the textual Wasm format slightly differently (just like
different disassemblers will). For example, the above may also be represented like this:
More information about the Wasm text format can be found here.
Picking up where we left off last time: both the initial shallow analysis and behavioral analysis
indicated that we’re dealing with a sorting algorithm. For the sample in question, you may well
be done at this point, depending on how much time you can sacrifice on one sample.
What if we were dealing with a sample whose function isn’t so obvious though? You may often
need to look at the source code, so let’s show how we could go about doing that.
To approach the code, we’ll once more turn to the wasm2wat tool that we used in our previous
blog post. We already found out that the sort function is function number 1. Below is the first part
of this function in Wasm’s textual representation:
$ ./wasm2wat quicksort.wasm
[snip]
(func (;1;) (type 1) (param i32 i32) (result i32)
(local i32)
It starts with the function definition, showing that the function takes two integers and returns one
integer. Then we have a local variable definition, local i32. Expressed in high-level pseudo code,
we have:
https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 3/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint
get_local 0
get_local 1
i32.ge_s
if ;; label = @1
get_local 1
return
end
The first two instructions, get_local 0/1, will get the values of the first and second function
parameter respectively, and push them to the stack. Then the third instruction, i32.ge_s, will
operate on these two values on the stack, by implicitly popping them and then testing whether
the first value is greater or equal to the second value. The result of the comparison will be
pushed to the stack. The subsequent if-statement will be true if the value at the top of the stack
is a non-zero value. In other words, the branching at the if-statement will depend on the three
previous instructions.
As mentioned earlier, the same code can be represented in different ways. If the above if-
statement feels hard to digest, here is an alternative representation:
(if
(i32.ge_s
(get_local $var$0)
(get_local $var$1)
)
What we have reversed so far can be expressed in high-level pseudo code like this:
Continuing looking at the Wasm text format, using wasm2wat, for the sort function we have:
get_local 0
get_local 1
i32.add
i32.const 4
i32.div_s
i32.const 2
i32.div_s
i32.const 4
i32.mul
This snippet of code does math calculations. Once again, the get_local 0/1 instructions get the
two parameter values that were passed to the function, and the subsequent ‘i32.add’ instruction
will operate on those two values, adding them together and putting the resulting number on the
stack.
After that we have the instruction ‘i32.const 4’, which will push the value 4 to the stack. The
subsequent instruction, i32.div_s divides the value next to the top of the stack with the value at
the very top. In other words, the previously added values, param1 + param2, will be divided by
4. Immediately after this we have the same pattern again, but this time a division by 2. Similarly,
the following two instructions involve the instruction i32.mul operating on the constant value 4.
The net result is that the value obtained so far is multiplied with 4. Expressed more concisely,
the code performs the following calculation: (param1 + param2) / 4 / 2 * 4
https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 4/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint
Let’s look at the subsequent Wasm code, with our comments added on the right:
Instruction Description
set_local Local variable var1 = whatever is on top of the stack, which we know is:
2 (param1 + param2) / 4 / 2 * 4
i32.load Pop the value at the top of the stack and use it as a pointer into global
memory, then fetch the data it points to and push that data to the top of
the stack.
call 0 Call function 0 (named ‘partition’), using the parameters that were just
set.
If this feels hard to grasp, try running the example in Chrome, single-stepping and watching in
the debugger (i.e. DevTools) how the stack and local variables change values, and what the
global memory looks like. An earlier blog post on Wasm analysis describes how to do that. By
the time we are at the ‘call 0’ instruction, before it has been executed, the local variables and
stack would look like this:
https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 5/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint
Instruction Description
i32.sub Subtract 4 from var1 (on the stack only, not in local variable memory).
i32.add Add 4 to var1 (on the stack only, not in local variable memory).
get_local Return var1 (the last remaining item on the stack will be the return
2) value).
https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 6/7
10/15/2018 Manual reverse engineering of WebAssembly: static code analysis | Forcepoint
We find that this is indeed a QuickSort implementation. If you wish, you can further verify this by
comparing the above pseudo code to some existing implementation you find on the Internet.
Reversing of the partition function is left as an exercise to the reader.
Conclusion
We have now successfully reverse engineered a complete Wasm function. First, we turned the
Wasm binary format into its textual format using the wasm2wat utility, and by analyzing the text
format representation we were able to create high-level pseudo code of the algorithm.
Tools exist for doing automatic decompilation, which is a more efficient way to do reverse
engineering than how we did it today. While automatic decompilation can save time, it’s often
imperfect and having an understanding of manual analysis allows us to work around these
imperfections.
References
https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format
https://github.com/sunfishcode/wasm-reference-
manual/blob/master/WebAssembly.md#instructions
https://www.pnfsoftware.com/reversing-wasm.pdf
https://sophos.files.wordpress.com/2018/08/sophos-understanding-web-assembly.pdf
https://www.forcepoint.com/blog/security-labs/manual-reverse-engineering-webassembly-static-code-analysis 7/7