[ Pobierz całość w formacie PDF ]
J Comput Virol (2010) 6:261–276 DOI 10.1007/s11416-009-0126-4 ORIGINAL PAPER Automatic binary deobfuscation Yoann Guillot · Alexandre Gazet Received: 4 January 2009 / Accepted: 17 July 2009 / Published online: 13 August 2009 © Springer-Verlag France 2009 Abstract This paper gives an overview of our research in the automation of the process of software protection analy- sis. We will focus more particularly on the problem of obfus- cation. Our current approach is based on a local semantic analysis, which aims to rewrite the binary code in a simpler (easier to understand) way. This approach has the advantage of not relying on a manual search for “patterns” of obfusca- tion. This way of manipulating the code is, at the end, quite similar to the optimising stage of most of compilers. We will exhibit concrete results based on the development of a pro- totype and its application to a test target. Current limitations and future prospects will be discussed in as well. We will now present the development of our newmethods, relying on a semantic analysis of the binary code to extract a simpler representation. The objective is no longer to seek and destroy known patterns, but to proceed to a complete, on-the-fly, optimised code rewriting. Wewill exhibit concrete results obtained by applying these methods to a test target. Then, current limitations and future prospects will be discussed. 1 Metasm Metasm 1 [ 2 ] is a free binary manipulation framework. Last year, we used it to solve two important reverse-engineering challenges. Based on these works, a few methods have been integrated into the mainstream code. They allowmany recur- rent analysis tasks to be simplified. As a continuation of our work from last year [ 1 ], we focus on the automation of the software protection analysis process. We will focus more particularly on the problem of obfuscation. This problem is crucial as most malicious binaries (like viruses or trojans) use this kind of protection to slow down their analysis and to make their detection harder. Automa- tion is a key step in order to face the constant growth of the amount of malware, year after year. Our previous paper was mainly focused on the attack and suppression of protection mechanisms using the Me- tasm framework. It provides many useful primitives to deal with protected code: control flowgraphmanipulation, recom- pilation, filtering processor,… Nevertheless most of these approaches rely on a tedious work of manual identification of the “patterns” used by the protection. 1.1 Disassembler Metasm can disassemble a binary file from one or many entry point(s). It then follows the execution flow and uses its built-in emulation engine to solve indirect jumps and mem- ory cross-references (i.e. which instruction reads or writes at which address(es)). This technique is referred to as back- tracking inside the framework. This concept is similar to the concept of slicing [ 3 – 5 ]. Covering the program’s code allow the construction of its execution graph (aka control flow graph). The nods of this graph are basic blocks (atomic sequences of instructions—if we do not consider exception that may be raised). These nods are organised in two interlaced graphs B Y. Guillot ) Sogeti, ESEC, 6-8 rue Duret, 75116 Paris, France e-mail: alexandre.gazet@sogeti.com; alex@security-labs.org Y. Guillot e-mail: yoann.guillot@sogeti.com · A. Gazet ( 1 123 262 Y. Guillot, A. Gazet – function’s block graph, – functions and sub-functions graph. 2 Case study: a protection analysis Fromnowon, we use the generic term packer to refer to a soft- ware protection applicable to a program (at binary or source level), in order to obfuscate its original form and to slow down a possible attacker/reverse-engineer. “Classic” pack- ers, like AsProtect , PECompact , are usually well han- dled by security products, anti-virus software and automatic classification tools for example. Many unpacking techniques have been developed over the last few years: This graph can be visualised using an interactive graphical interface. 1.2 New functionalities Two main improvements have been made to the framework since our last paper. The first one is a method allowing the graph to be mod- ified by replacing some of its components. This function, replace_instrs , requires three parameters: – Static/dynamic unpackers, most of time based on a deep analysis of how the protection works. The unpacking pro- cess can later be automated using a scriptable debugger for example. – Generic unpackers (using code instrumentation [ 6 ], or emulation like Pandora’s Bochs [ 7 ]or Renovo [ 8 ].) 1. the address of the first instruction of the first block to replace, 2. the address of the last instruction of the last block to replace, Such protections mainly rely on concepts like compres- sion and encryption built right up against anti-debugging functions, licence management, etc. The main weakness of this class of protection is that, at some point, the code has to be decompressed/decrypted in memory (at least partially) before being executed by the processor. It is then possible to dump and analyse the code. Close to classic packers stands another class of protec- tion that takes advantage of the virtualization concept. This class of protection is not vulnerable to the previously men- tioned attacks, and fewgeneric analysis techniques have been proposed. A part of our research is dedicated to virology and it happens that we encounter many instances of the same virtualization-based protection. Consequently, we have decided to carry out the analysis of this protection. By quickly comparing the different instances of the pro- tection, we have discovered that we will manipulate to main concepts: 3. the list of new instructions that will be inserted as a replacement (it may be an empty list). A new block is then built from the new list of instructions, and inserted into the graph, instead of the previously selected blocks. The second improvement is a method, code_binding , that allows to obtain the semantics of a section of code. The method takes advantage of the backtracking engine which is at the heart of Metasm’s disassembler. The engine is called many times to determine the semantics of the code section. It regroups the effects of: – registers modifications, – and memory writings. For the moment, this analysis is limited to code sections with a simple linear structure (without loop or conditional jump). As we will see later, it is nevertheless at the heart of most of our attacks. As an example, getting the semantics allows us to overcome the level of abstraction provided by a software virtual machine based protection. Finally, the instrumentation of the disassembly engine has been facilitated by the implementation and export of many callbacks , allowing us to take control at different moments and to intercept manipulated data. Here are some of those callbacks: – Polymorphism . This concept came to the forefront in the early 90’s, with viral codes as the main application field. The challenge was then to try to defeat the signature based detection algorithms used by anti-virus software. By mutating the code’s form, it was possible to circum- vent a signature, and make the malware go undetected. In order to do so, one could express the same original code semantics using a different sequence of instructions. As the battle raged on between viral code authors and anti-virus editors, the editors tried to react, using more advanced algorithms to defeat obfuscation techniques. Many more formal works have been published on this subject. As an example, in 2003, Frédéric Perriot pro- posed an approach based on the use of compiler optimi- sations to improve polymorphic code detection [ 9 ]; other – when a new instruction is disassembled, – when a jump is detected (conditional or not), – when some self-modifying code is detected, – at the end of the disassembler work. 123 Automatic binary deobfuscation 263 Fig. 1 A virtual machine call works were presented in 2008 by Matt Webster and Grant Malcolm [ 10 ]. In the same spirit, one should be aware of Mihai Christodorescu’s paper [ 11 ]. In these two cases, the main idea is to automate the deobfuscation process in order to improve viral code detection rates. – Virtualization . For recall, in the field of software pro- tection, the term virtual machine refers to a software component emulating the behaviour of a processor. This virtual processor has its own set of instructions and exe- cutes programs specifically compiled into the appropriate bytecode. It amounts to adding a new level of abstraction between the machine code that is seen during the analysis (using a debugger or a disassembler) and its real seman- tics. For more details on the internal workings of a virtual machine based software protection, the interested reader can refer to our previous paper [ 1 ]. Fig. 2 Virtual machine context The approach we will present here is thought to be didac- tic. At each step of the analysis, we will point out the diffi- culties encountered and we solved them. 2.1 Discovering the protection’s architecture It takes only a few minutes to discover that a virtual machine is used by the protection. When loaded into a disassem- bler, many big undefined memory areas appear. Furthermore, many distinctive function calls are used (Fig. 1 ): the original code has been replaced by an initialisation stub invoking the virtual machine; the address pushed onto the stack is actu- ally the address of the bytecode implementing the protected algorithm. Two classical distinctive patterns (Fig. 2 ), quite specific to virtual machine-based software protection: a context (actu- ally the virtual machine’s registers), and a table of handlers (for recall we consider that a handler refers to the implemen- tation of a virtual opcode/instruction). So far, so good; no real difficulties. The code is quite standard. Fig. 3 Standard structure of a handler 2.2 Optimise to tame the code many basic blocks, linked by unconditional jumps. This kind of code is sometimes referred to “ spaghetti code ”. This technique is actually not very effective: in our previous paper we already developed methods to automatically merge basic blocks when needed and to rebuild the control flow graph. We have seen that the protection refers to a table of han- dlers; the next natural step is to analyse them. This will allow us to identify the instructions set of the virtual processor. One can see an example of a handler in Fig. 3 . The first char- acteristic that one should notice is that the code is splitted into 123 264 Y. Guillot, A. Gazet Fig. 4 Propagation of value 12h through al Fig. 5 cl Register assignment simplification Fig. 6 Reduction of the add computation One can also notice that most of the basic blocks imply many basic arithmetic operations, and make excessive use of stack operations. This behaviour clearly stems from an obfuscation process. We need to clean the code in order to be able to analyze it effectively. The difficulties can nowbe expressed like this: howcanwe get rid of the obfuscation with the minimal amount of manual analysis? Our answer was to use compiler optimisation tech- niques. An optimisation is a code transformation for which many contradictory objectives may be sought-after: speed of execution, final size of code, etc. Our optimisation process has for only objective (our optimisation criteria) to reduce the code to a minimal, concise form. We are not at all pre- occupied with performance or size, even if as a side effect of our optimisation process they will also be dramatically improved. One of the most surprising point about the optimisation techniques we used is their simplicity. From an algorithmic point of view, these techniques are quite affordable and quite effective. For this step, we draw our inspiration from works proposed for an equivalent target [ 12 ]. Nevertheless, we did not adopt the same angle of attack, namely working on a tex- tual representation of the code (using a lexer and a parser). Indeed, we already have a representation of all the disas- sembled instructions: we can directly manipulate Metasm’s instruction objects on the fly. Our methods will be performed at the assembly level, inside the basic blocks of the control flow graph. Here are some of the techniques we have implemented in our optimisation engine: patterns, it may allow us to avoid using too complex techniques. 2. Constant propagation. The basic idea is to propagate the known value of a variable in the expressions using it (Fig. 4 ). The propagated value is 12h . It can be found at line 1, with register al being assigned. On line 3, al , which has not beenmodified since, is then replaced by its numerical value. 3. Constant folding. The initial value of a variable is sim- plified by statically solving some superfluous arithmetic operations (Fig. 5 ). At line 2, cl register is assigned with the value 46h ; then at line 3, a xor operation with a constant will mod- ify the value contained in cl register. We simplify this basic piece of code using a direct assignment of the cl register by the result of the operation 46h xor 12h = 54h . Finally, line 3 is removed from the control flow. 4. Operation folding. Once again, a computation is simpli- fied statically, but we do not compute a final result to assign to a variable. In this example (Fig. 6 ), two add instructions add al, -7fh and add al, -70h are joined into a sin- gle one. The resulting instruction can be expressed as add al, (-7fh + -70h) , that is add al, 11h . Furthermore, our optimisation engine handles the com- mutativity of operators, which allow us here to freely reorder a sequence of e.g. add instructions, in the most useful way. 5. Stack optimisation We did a very trivial implementation of this technique. There are two use cases: – A useless push - pop couple. For example a register is pushed on the stack and popped without being read or written. – An element a (for example register eax ) is pushed onto the stack and then popped into an element b for (for example register ebx ). If possible this 1. Peephole optimisation. It amounts to replacing a known pattern (for example a sequence of instructions) by a simpler form. This technique is, from our point of view, the least interesting because it relies on a manual discov- ery of those patterns. Nevertheless, for certain precise 123 Automatic binary deobfuscation 265 Fig. 7 Useless push-pop couple Fig. 9 Decryption of the next handler’s index Fig. 10 Optimised handler’s code – locally: it increases the complexity of each of the han- dlers and raises the amount of work needed to defeat the protection. – globally: each generated instance of the virtual machine is different from the next (the mutated code of each han- dler will differ). Thus, an attacker with ineffective tools has to reanalyse each new instance from scratch. Fig. 8 Optimised handler push-pop couple is transformed into the clearer mov b, a instruction. According to Fig. 7 , it is possible to clean the couple of instruction push ebx - pop ebx as ebx register is not modified (no write access) between the two consid- ered instructions. From an “offensive” point of view, the obfuscation engine is by far too weak. Rebuilding instruction through “spaghetti code” is not a difficulty. Even if we work at a very low level of abstraction (inside basic blocks), results are quite self- explanatory. The optimisation engine produces a very clean code and manual analysis has been reduced to a minimum. The module progressively rewrites the code. Each of the optimisationmethods is a rewriting rule, possi- bly associated with one or more condition. Each of the trans- formations has to be semantically correct: the optimised code should compute the same function as the original, obfuscated code. Finally, one has to ensure that this engine, or rewriting system, actually halts. These different methods are integrated into an iterative process: while at least one of the methods manages to optimise a piece of code, the process is called once again. Despite their simplicity, the first results were beyond our expectations. The result is quite satisfying: the code of the handler has been drastically reduced. Most of the handlers were initially composed of 100 to 200 instructions and split into a great number of basic blocks. The semantically equivalent opti- mized code is reduced to at most 10 instructions, all of them inside a single basic block. Actually, a few handlers (less than five), due to their function, are more complex and are still composed of a small number of basic blocks. We see now that all the handlers share a small final seq- uence of instructions. This actually is a kind of control stub (Fig. 9 ), which computes the address of the next handler to execute. This computation is the decryption of the virtual execution flow pointer using a key stored in the ebx register, while the bytecode instruction pointer is located in the esi register. As a conclusion, we can say that the semantics of a handler only rely on a small number of instructions (Fig. 10 ). From a “defensive” point of view, using obfuscation, and by extension, polymorphism, is quite interesting. It can be considered on two different levels: 2.3 Handler analysis The previous step allows us to get a clean, optimised code for each of the handlers. Even if the result is very positive, we still are far from our objective. As noted earlier, a method has been added Metasm , which allow the semantics of a section of code to be computed. Thus, we’ll apply it on every handler. 123 [ Pobierz całość w formacie PDF ] |
Podobne
|