Tool Paper - SEMA: Symbolic Execution Toolchain for Malware Analysis
Today, malware threats are more dangerous than ever with thousand of new samples emerging everyday. There exists a wide range of static and dynamic tools to detect malware signatures. Unfortunately, most of those tools are helpless when coming to automatic detection of polymorphic malwares, i.e., malware signature variants belonging to the same family. Recent work propose to handle those difficulties with symbolic execution and machine learning. Contrary to classical analysis, symbolic execution offers a deep exploration of malware’s code and, consequently, contribute to building more informative signatures. Those can then be generalized to an entire family via machine learning training. The contribution of this tool paper is the presentation of SEMA - a Symbolic Execution open-source toolchain for Malware Analysis. SEMA is based on a dedicated extension of ANGR, a well-known symbolic analyser that can be used to extract API calls and their corresponding arguments. Especially, we extend ANGR with strategies to create representative signatures based on System Call Dependency graph (SCDG). Those SCDGs can be exploited in two machine learning modules based on graphs and vectors. Last but not least, SEMA offers the first federating learning module for symbolic malware analysis.