Aller au contenu principal

A Unified Comparison of Tabular and Graph-Based Feature Representations in Machine Learning for Malware Detection

Feature representation is a key factor in machine learning-based malware detection, affecting the information expressed and used for detection, the choice of the classifier, and computational efficiency. While both tabular and graph-based feature representations have been widely studied, we lack a systematic comparison under unified conditions. This study compares tabular and graph-based features extracted through static and dynamic analysis for malware detection. We evaluate these representations using state-of-the-art models on a unified dataset of Windows PE32 files. Our analysis focuses on three aspects: computational time required for feature extraction, detection performance, and robustness against adversarial attacks. To ensure a fair evaluation, we assess detection performance on both in-distribution and out-of-distribution datasets, highlighting the trade-offs between feature complexity, model accuracy, and real-world applicability. We find that tabular features perform best in most scenarios, making the cost of building graphs not always justified.