Robust Malware Detection in Adversarial Environments: Analysis, Evaluation, and Defense Strategies
Project Description
The dynamic evolution of malware, combined with increasingly sophisticated evasion techniques such as packing, obfuscation, and polymorphism, presents a significant challenge to conventional security mechanisms. As a result, machine learning (ML)-based malware detection systems are being adopted widely due to their ability to generalize and automate malware identification. However, these systems are also susceptible to adversarial threats, and current solutions struggle to robustly identify evasive or morphed malware.
To address this critical issue, InfoLab at Sungkyunkwan University (SKKU) has led a comprehensive research project spanning three key investigations, each targeting a unique vulnerability in ML-based malware detection pipelines—from data representation and feature manipulation to evasion through software packing.
Core Research Contributions
1. Spectral Analysis of Control Flow Graphs for Malware Detection
We propose a novel approach for malware classification using spectral representations of control flow graphs (CFGs). Leveraging heat and wave kernels, the research extracts size- and permutation-invariant graph signatures for malware detection.
- Dataset: Over 37,000 Windows executables
- Models evaluated: Eight ML classifiers
- Performance: Accuracy of up to 95.9%
Key Insight: Spectral signatures provide a scalable and effective alternative to byte-level feature extraction, especially in adversarial scenarios involving structural manipulation.
2. MLxPack: Investigating the Effects of Packers on ML-Based Malware Detection
This study examines how packing techniques—used to disguise malicious intent—affect ML classifier accuracy. Using a large dataset of 107,000 packed and unpacked samples, the research explores both static and dynamic features.
- Finding: Dynamic analysis significantly improves detection robustness
- Best performance: Achieved with hybrid (static + dynamic) feature sets
Key Insight: Detection systems must account for packing effects by incorporating diverse feature representations and multi-perspective analysis.
3. Visualization-Based Malware Analysis Using Feature Fusion
Focusing on Android malware, this study introduces a feature fusion technique that combines handcrafted texture descriptors (GIST, LBP, GLCM) with deep CNN features from grayscale images of malware components (e.g., classes.dex
, manifest files).
- Dataset: DREBIN
- Accuracy: Achieved 93.24% across multiple ML models
Key Insight: Visualization-based static analysis offers a powerful and resilient approach to detect obfuscated and packed Android malware.
Project Objectives
- Develop robust, interpretable, and adversarial-resilient malware detection pipelines.
- Analyze the impact of binary obfuscation and packing on static and dynamic ML-based detection models.
- Investigate multi-view and feature-fusion strategies for improved adversarial robustness.
- Design efficient and scalable representations (e.g., spectral graph embeddings) that capture malware behavior invariant to structural changes.
- Provide benchmark datasets, evaluation protocols, and reproducible experiments for the research community.
Research Impact
This project by InfoLab at SKKU presents a multidimensional approach to adversarial malware analysis, bridging the gap between ML robustness and real-world evasion tactics. Key contributions include:
- Designing detection models resilient to advanced evasion techniques like packing and morphing.
- Developing interpretable and generalizable malware representations.
- Guiding the development of next-generation defense frameworks for both desktop and mobile environments.
Together, these efforts lay a strong foundation for secure, explainable, and adversarially resilient malware detection systems.