Automatic binary deobfuscation, Hacking and IT E-Book Dump Release, pobierz pdf

[ Pobierz całość w formacie PDF ]
J Comput Virol (2010) 6:261–276
DOI 10.1007/s11416-009-0126-4
ORIGINAL PAPER
Automatic binary deobfuscation
Yoann Guillot
·
Alexandre Gazet
Received: 4 January 2009 / Accepted: 17 July 2009 / Published online: 13 August 2009
© Springer-Verlag France 2009
Abstract
This paper gives an overview of our research in
the automation of the process of software protection analy-
sis. We will focus more particularly on the problem of obfus-
cation. Our current approach is based on a local semantic
analysis, which aims to rewrite the binary code in a simpler
(easier to understand) way. This approach has the advantage
of not relying on a manual search for “patterns” of obfusca-
tion. This way of manipulating the code is, at the end, quite
similar to the optimising stage of most of compilers. We will
exhibit concrete results based on the development of a pro-
totype and its application to a test target. Current limitations
and future prospects will be discussed in as well.
We will now present the development of our newmethods,
relying on a semantic analysis of the binary code to extract
a simpler representation. The objective is no longer to seek
and destroy known patterns, but to proceed to a complete,
on-the-ﬂy, optimised code rewriting.
Wewill exhibit concrete results obtained by applying these
methods to a test target. Then, current limitations and future
prospects will be discussed.
1 Metasm
Metasm
1
[
2
] is a free binary manipulation framework. Last
year, we used it to solve two important reverse-engineering
challenges. Based on these works, a few methods have been
integrated into the mainstream code. They allowmany recur-
rent analysis tasks to be simpliﬁed.
As a continuation of our work from last year [
1
], we
focus on the automation of the software protection analysis
process. We will focus more particularly on the problem of
obfuscation.
This problem is crucial as most malicious binaries (like
viruses or trojans) use this kind of protection to slow down
their analysis and to make their detection harder. Automa-
tion is a key step in order to face the constant growth of the
amount of malware, year after year.
Our previous paper was mainly focused on the attack
and suppression of protection mechanisms using the Me-
tasm framework. It provides many useful primitives to deal
with protected code: control ﬂowgraphmanipulation, recom-
pilation, ﬁltering processor,… Nevertheless most of these
approaches rely on a tedious work of manual identiﬁcation
of the “patterns” used by the protection.
1.1 Disassembler
Metasm can disassemble a binary ﬁle from one or many
entry point(s). It then follows the execution ﬂow and uses its
built-in emulation engine to solve indirect jumps and mem-
ory
cross-references
(i.e. which instruction reads or writes
at which address(es)). This technique is referred to as
back-
tracking
inside the framework. This concept is similar to the
concept of
slicing
[
3
–
5
].
Covering the program’s code allow the construction of its
execution graph (aka control ﬂow graph). The nods of this
graph are basic blocks (atomic sequences of instructions—if
we do not consider exception that may be raised).
These nods are organised in two interlaced graphs
B
Y. Guillot
)
Sogeti, ESEC, 6-8 rue Duret, 75116 Paris, France
e-mail: alexandre.gazet@sogeti.com; alex@security-labs.org
Y. Guillot
e-mail: yoann.guillot@sogeti.com
·
A. Gazet (
1
123
262
Y. Guillot, A. Gazet
– function’s block graph,
– functions and sub-functions graph.
2 Case study: a protection analysis
Fromnowon, we use the generic term
packer
to refer to a soft-
ware protection applicable to a program (at binary or source
level), in order to obfuscate its original form and to slow
down a possible attacker/reverse-engineer. “Classic” pack-
ers, like
AsProtect
,
PECompact
, are usually well han-
dled by security products, anti-virus software and automatic
classiﬁcation tools for example. Many unpacking techniques
have been developed over the last few years:
This graph can be visualised using an interactive graphical
interface.
1.2 New functionalities
Two main improvements have been made to the framework
since our last paper.
The ﬁrst one is a method allowing the graph to be mod-
iﬁed by replacing some of its components. This function,
replace_instrs
, requires three parameters:
– Static/dynamic unpackers, most of time based on a deep
analysis of how the protection works. The unpacking pro-
cess can later be automated using a scriptable debugger
for example.
– Generic unpackers (using code instrumentation [
6
], or
emulation like
Pandora’s Bochs
[
7
]or
Renovo
[
8
].)
1.
the address of the ﬁrst instruction of the ﬁrst block to
replace,
2.
the address of the last instruction of the last block to
replace,
Such protections mainly rely on concepts like compres-
sion and encryption built right up against anti-debugging
functions, licence management, etc. The main weakness of
this class of protection is that, at some point, the code has
to be decompressed/decrypted in memory (at least partially)
before being executed by the processor. It is then possible to
dump and analyse the code.
Close to classic packers stands another class of protec-
tion that takes advantage of the virtualization concept. This
class of protection is not vulnerable to the previously men-
tioned attacks, and fewgeneric analysis techniques have been
proposed. A part of our research is dedicated to virology
and it happens that we encounter many instances of the
same virtualization-based protection. Consequently, we have
decided to carry out the analysis of this protection.
By quickly comparing the different instances of the pro-
tection, we have discovered that we will manipulate to main
concepts:
3.
the list of new instructions that will be inserted as a
replacement (it may be an empty list).
A new block is then built from the new list of instructions,
and inserted into the graph, instead of the previously selected
blocks.
The second improvement is a method,
code_binding
,
that allows to obtain the semantics of a section of code.
The method takes advantage of the backtracking engine
which is at the heart of Metasm’s disassembler. The engine
is called many times to determine the semantics of the code
section. It regroups the effects of:
– registers modiﬁcations,
– and memory writings.
For the moment, this analysis is limited to code sections
with a simple linear structure (without loop or conditional
jump). As we will see later, it is nevertheless at the heart of
most of our attacks. As an example, getting the semantics
allows us to overcome the level of abstraction provided by a
software virtual machine based protection.
Finally, the instrumentation of the disassembly engine has
been facilitated by the implementation and export of many
callbacks
, allowing us to take control at different moments
and to intercept manipulated data. Here are some of those
callbacks:
–
Polymorphism
. This concept came to the forefront in the
early 90’s, with viral codes as the main application ﬁeld.
The challenge was then to try to defeat the signature
based detection algorithms used by anti-virus software.
By mutating the code’s form, it was possible to circum-
vent a signature, and make the malware go undetected.
In order to do so, one could express the same original
code semantics using a different sequence of instructions.
As the battle raged on between viral code authors and
anti-virus editors, the editors tried to react, using more
advanced algorithms to defeat obfuscation techniques.
Many more formal works have been published on this
subject. As an example, in 2003, Frédéric Perriot pro-
posed an approach based on the use of compiler optimi-
sations to improve polymorphic code detection [
9
]; other
– when a new instruction is disassembled,
– when a jump is detected (conditional or not),
– when some self-modifying code is detected,
– at the end of the disassembler work.
123
Automatic binary deobfuscation
263
Fig. 1
A virtual machine call
works were presented in 2008 by Matt Webster and Grant
Malcolm [
10
]. In the same spirit, one should be aware of
Mihai Christodorescu’s paper [
11
]. In these two cases,
the main idea is to automate the deobfuscation process in
order to improve viral code detection rates.
–
Virtualization
. For recall, in the ﬁeld of software pro-
tection, the term
virtual machine
refers to a software
component emulating the behaviour of a processor. This
virtual processor
has its own set of instructions and exe-
cutes programs specifically compiled into the appropriate
bytecode. It amounts to adding a new level of abstraction
between the machine code that is seen during the analysis
(using a debugger or a disassembler) and its real seman-
tics. For more details on the internal workings of a virtual
machine based software protection, the interested reader
can refer to our previous paper [
1
].
Fig. 2
Virtual machine context
The approach we will present here is thought to be didac-
tic. At each step of the analysis, we will point out the difﬁ-
culties encountered and we solved them.
2.1 Discovering the protection’s architecture
It takes only a few minutes to discover that a virtual machine
is used by the protection. When loaded into a disassem-
bler, many big undeﬁned memory areas appear. Furthermore,
many distinctive function calls are used (Fig.
1
): the original
code has been replaced by an initialisation stub invoking the
virtual machine; the address pushed onto the stack is actu-
ally the address of the bytecode implementing the protected
algorithm.
Two classical distinctive patterns (Fig.
2
), quite speciﬁc to
virtual machine-based software protection: a context (actu-
ally the virtual machine’s registers), and a table of handlers
(for recall we consider that a handler refers to the implemen-
tation of a virtual opcode/instruction).
So far, so good; no real difﬁculties. The code is quite
standard.
Fig. 3
Standard structure of a handler
2.2 Optimise to tame the code
many basic blocks, linked by unconditional jumps.
This kind of code is sometimes referred to “
spaghetti code
”.
This technique is actually not very effective: in our previous
paper we already developed methods to automatically merge
basic blocks when needed and to rebuild the control ﬂow
graph.
We have seen that the protection refers to a table of han-
dlers; the next natural step is to analyse them. This will allow
us to identify the instructions set of the virtual processor.
One can see an example of a handler in Fig.
3
. The ﬁrst char-
acteristic that one should notice is that the code is splitted into
123
264
Y. Guillot, A. Gazet
Fig. 4
Propagation of value
12h through
al
Fig. 5
cl
Register assignment
simpliﬁcation
Fig. 6
Reduction of the
add
computation
One can also notice that most of the basic blocks imply
many basic arithmetic operations, and make excessive use
of stack operations. This behaviour clearly stems from an
obfuscation process. We need to clean the code in order to
be able to analyze it effectively.
The difﬁculties can nowbe expressed like this: howcanwe
get rid of the obfuscation with the minimal amount of manual
analysis? Our answer was to use compiler optimisation tech-
niques. An optimisation is a code transformation for which
many contradictory objectives may be sought-after: speed of
execution, ﬁnal size of code, etc. Our optimisation process
has for only objective (our optimisation criteria) to reduce
the code to a minimal, concise form. We are not at all pre-
occupied with performance or size, even if as a side effect
of our optimisation process they will also be dramatically
improved.
One of the most surprising point about the optimisation
techniques we used is their simplicity. From an algorithmic
point of view, these techniques are quite affordable and quite
effective. For this step, we draw our inspiration from works
proposed for an equivalent target [
12
]. Nevertheless, we did
not adopt the same angle of attack, namely working on a tex-
tual representation of the code (using a lexer and a parser).
Indeed, we already have a representation of all the disas-
sembled instructions: we can directly manipulate Metasm’s
instruction objects on the ﬂy. Our methods will be performed
at the assembly level, inside the basic blocks of the control
ﬂow graph.
Here are some of the techniques we have implemented in
our optimisation engine:
patterns, it may allow us to avoid using too complex
techniques.
2.
Constant propagation.
The basic idea is to propagate the
known value of a variable in the expressions using it
(Fig.
4
).
The propagated value is
12h
. It can be found at line 1,
with register
al
being assigned. On line 3,
al
, which has
not beenmodiﬁed since, is then replaced by its numerical
value.
3.
Constant folding.
The initial value of a variable is sim-
pliﬁed by statically solving some superﬂuous arithmetic
operations (Fig.
5
).
At line 2,
cl
register is assigned with the value
46h
;
then at line 3, a
xor
operation with a constant will mod-
ify the value contained in
cl
register. We simplify this
basic piece of code using a direct assignment of the
cl
register by the result of the operation
46h xor 12h
= 54h
. Finally, line 3 is removed from the control ﬂow.
4.
Operation folding.
Once again, a computation is simpli-
ﬁed statically, but we do not compute a ﬁnal result to
assign to a variable.
In this example (Fig.
6
), two
add
instructions
add
al, -7fh
and
add al, -70h
are joined into a sin-
gle one. The resulting instruction can be expressed as
add al, (-7fh + -70h)
, that is
add al, 11h
.
Furthermore, our optimisation engine handles the com-
mutativity of operators, which allow us here to freely
reorder a sequence of e.g.
add
instructions, in the most
useful way.
5.
Stack optimisation
We did a very trivial implementation
of this technique. There are two use cases:
– A useless
push
-
pop
couple. For example a register is
pushed on the stack and popped without being read
or written.
– An element
a
(for example register
eax
) is pushed
onto the stack and then popped into an element
b
for (for example register
ebx
). If possible this
1.
Peephole optimisation.
It amounts to replacing a known
pattern (for example a sequence of instructions) by a
simpler form. This technique is, from our point of view,
the least interesting because it relies on a manual discov-
ery of those patterns. Nevertheless, for certain precise
123
Automatic binary deobfuscation
265
Fig. 7
Useless
push-pop
couple
Fig. 9
Decryption of the next handler’s index
Fig. 10
Optimised handler’s code
– locally: it increases the complexity of each of the han-
dlers and raises the amount of work needed to defeat the
protection.
– globally: each generated instance of the virtual machine
is different from the next (the mutated code of each han-
dler will differ). Thus, an attacker with ineffective tools
has to reanalyse each new instance from scratch.
Fig. 8
Optimised handler
push-pop
couple is transformed into the clearer
mov b, a
instruction.
According to Fig.
7
, it is possible to clean the couple of
instruction
push ebx
-
pop ebx
as
ebx
register is
not modiﬁed (no write access) between the two consid-
ered instructions.
From an “offensive” point of view, the obfuscation engine
is by far too weak. Rebuilding instruction through “spaghetti
code” is not a difﬁculty. Even if we work at a very low level
of abstraction (inside basic blocks), results are quite self-
explanatory. The optimisation engine produces a very clean
code and manual analysis has been reduced to a minimum.
The module progressively rewrites the code.
Each of the optimisationmethods is a rewriting rule, possi-
bly associated with one or more condition. Each of the trans-
formations has to be semantically correct: the optimised code
should compute the same function as the original, obfuscated
code. Finally, one has to ensure that this engine, or rewriting
system, actually halts.
These different methods are integrated into an iterative
process: while at least one of the methods manages to
optimise a piece of code, the process is called once again.
Despite their simplicity, the ﬁrst results were beyond our
expectations.
The result is quite satisfying: the code of the handler has
been drastically reduced. Most of the handlers were initially
composed of 100 to 200 instructions and split into a great
number of basic blocks. The semantically equivalent opti-
mized code is reduced to at most 10 instructions, all of them
inside a single basic block. Actually, a few handlers (less
than ﬁve), due to their function, are more complex and are
still composed of a small number of basic blocks.
We see now that all the handlers share a small ﬁnal seq-
uence of instructions. This actually is a kind of control stub
(Fig.
9
), which computes the address of the next handler to
execute. This computation is the decryption of the virtual
execution ﬂow pointer using a key stored in the ebx register,
while the bytecode instruction pointer is located in the
esi
register.
As a conclusion, we can say that the semantics of a handler
only rely on a small number of instructions (Fig.
10
).
From a “defensive” point of view, using obfuscation, and
by extension, polymorphism, is quite interesting. It can be
considered on two different levels:
2.3 Handler analysis
The previous step allows us to get a clean, optimised code
for each of the handlers. Even if the result is very positive,
we still are far from our objective. As noted earlier, a method
has been added
Metasm
, which allow the semantics of a
section of code to be computed. Thus, we’ll apply it on every
handler.
123
[ Pobierz całość w formacie PDF ]

Podobne