Skip to main content

Symbolic analysis meets federated learning to enhance malware identifier

The manual methods to create detection rules are no longer practical in the anti-malware product since the number of malware threats has been growing over past years. Thus, the turn to machine learning approaches is a promising way to make malware recognition more efficient. The traditional centralized machine learning requires a large amount of data to train a model with excellent performance. To boost the malware detection, the training data might be on various kind of data sources such as data on the host, network, and cloud-based anti-malware components, or even, data from different enterprises. To avoid the expenses of data collection as well as the leakage of private data, we present a federated learning system to identify malware through behavioral graphs, i.e., system call dependency graphs. It is based on a deep learning model including a graph autoencoder and a multiclass classifier module. This model is trained by a secure learning protocol among clients to preserve the private data against inference attacks. Using the model to identify malware, we achieve the accuracy of for homogeneous graph data and for inhomogeneous graph data.

Digital Object Identifier (DOI)
https://doi.org/10.1145/3538969.3538996