Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Robust Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via (De)Randomized Smoothing (2402.15267v2)

Published 23 Feb 2024 in cs.CR and cs.AI

Abstract: Deep learning-based malware detectors have been shown to be susceptible to adversarial malware examples, i.e. malware examples that have been deliberately manipulated in order to avoid detection. In light of the vulnerability of deep learning detectors to subtle input file modifications, we propose a practical defense against adversarial malware examples inspired by (de)randomized smoothing. In this work, we reduce the chances of sampling adversarial content injected by malware authors by selecting correlated subsets of bytes, rather than using Gaussian noise to randomize inputs like in the Computer Vision (CV) domain. During training, our ablation-based smoothing scheme trains a base classifier to make classifications on a subset of contiguous bytes or chunk of bytes. At test time, a large number of chunks are then classified by a base classifier and the consensus among these classifications is then reported as the final prediction. We propose two strategies to determine the location of the chunks used for classification: (1) randomly selecting the locations of the chunks and (2) selecting contiguous adjacent chunks. To showcase the effectiveness of our approach, we have trained two classifiers with our chunk-based ablation schemes on the BODMAS dataset. Our findings reveal that the chunk-based smoothing classifiers exhibit greater resilience against adversarial malware examples generated with state-of-the-are evasion attacks, outperforming a non-smoothed classifier and a randomized smoothing-based classifier by a great margin.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. H. S. Anderson and P. Roth. EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models. ArXiv e-prints, April 2018.
  2. Generating adversarial malware examples for black-box attacks based on GAN. In Data Mining and Big Data - 7th International Conference, DMBD 2022, Beijing, China, November 21-24, 2022, Proceedings, Part II, volume 1745 of Communications in Computer and Information Science, pages 409–423. Springer, 2022.
  3. A wolf in sheep’s clothing: Query-free evasion attacks against machine learning-based malware detectors with generative adversarial networks. In 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pages 415–426, Los Alamitos, CA, USA, jul 2023. IEEE Computer Society.
  4. Malware detection by eating a whole EXE. In The Workshops of the The Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018, volume WS-18 of AAAI Technical Report, pages 268–276. AAAI Press, 2018.
  5. Fusing feature engineering and deep learning: A case study for malware classification. Expert Systems with Applications, 207:117957, 2022.
  6. Deep convolutional malware classifiers can learn from raw executables and labels only. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings. OpenReview.net, 2018.
  7. Exploring adversarial examples in malware detection. In 2019 IEEE Security and Privacy Workshops, SP Workshops 2019, San Francisco, CA, USA, May 19-23, 2019, pages 8–14. IEEE, 2019.
  8. Functionality-preserving black-box optimization of adversarial windows malware. IEEE Transactions on Information Forensics and Security, 2021.
  9. Adversarial examples on discrete sequences for beating whole-binary malware detection. CoRR, abs/1802.04528, 2018.
  10. Adversarial exemples: A survey and experimental evaluation of practical attacks on machine learning for windows malware detection. ACM Transactions on Privacy and Security, 2021.
  11. Optimization of code caves in malware binaries to evade machine learning detectors. Computers & Security, 116:102643, 2022.
  12. (de)randomized smoothing for certifiable defense against patch attacks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  13. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
  14. Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III, volume 8190 of Lecture Notes in Computer Science, pages 387–402. Springer, 2013.
  15. Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  16. Non-negative networks against adversarial attacks, 2018.
  17. Learning understandable neural networks with nonnegative weight constraints. IEEE Transactions on Neural Networks and Learning Systems, 26(1):62–69, 2015.
  18. Adversarial training for raw-binary malware classifiers. In Proceedings of the 32nd USENIX Security Symposium. USENIX, August 2023. To appear.
  19. Malware makeover: Breaking ml-based static analysis by modifying executable bytes. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, ASIA CCS ’21, page 744–758, New York, NY, USA, 2021. Association for Computing Machinery.
  20. Towards a practical defense against adversarial attacks on deep learning-based malware detectors via randomized smoothing.
  21. Certified adversarial robustness via randomized smoothing. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 1310–1320. PMLR, 2019.
  22. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  23. Bodmas: An open dataset for learning based temporal analysis of pe malware. In 4th Deep Learning and Security Workshop, 2021.
  24. secml-malware: A python library for adversarial robustness evaluation of windows malware classifiers, 2021.
  25. Adversarial malware binaries: Evading deep learning for malware detection in executables. In 26th European Signal Processing Conference, EUSIPCO 2018, Roma, Italy, September 3-7, 2018, pages 533–537. IEEE, 2018.
  26. Explaining vulnerabilities of deep learning to adversarial malware binaries. CoRR, abs/1901.03583, 2019.
Citations (2)

Summary

We haven't generated a summary for this paper yet.