CCS2025
Deep Learning from Imperfectly Labeled Malware Data
Fahad Alotaibi, Euan Goodbrand, Sergio Maffeis
摘要
Deep learning approaches have achieved remarkable performance in malware classification and detection. However, their success relies on the availability of large, accurately labeled datasets: a critical yet challenging requirement in the malware domain. In practice, most malware datasets are automatically labeled using outputs from antivirus engines, a process that often introduces significant label noise. Such imperfections can severely degrade the performance and generalizability of deep learning models.