Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time (Extended Version) (2402.01359v1)

Published 2 Feb 2024 in cs.LG, cs.CR, and cs.PF

Abstract: Machine learning (ML) plays a pivotal role in detecting malicious software. Despite the high F1-scores reported in numerous studies reaching upwards of 0.99, the issue is not completely solved. Malware detectors often experience performance decay due to constantly evolving operating systems and attack methods, which can render previously learned knowledge insufficient for accurate decision-making on new inputs. This paper argues that commonly reported results are inflated due to two pervasive sources of experimental bias in the detection task: spatial bias caused by data distributions that are not representative of a real-world deployment; and temporal bias caused by incorrect time splits of data, leading to unrealistic configurations. To address these biases, we introduce a set of constraints for fair experiment design, and propose a new metric, AUT, for classifier robustness in real-world settings. We additionally propose an algorithm designed to tune training data to enhance classifier performance. Finally, we present TESSERACT, an open-source framework for realistic classifier comparison. Our evaluation encompasses both traditional ML and deep learning methods, examining published works on an extensive Android dataset with 259,230 samples over a five-year span. Additionally, we conduct case studies in the Windows PE and PDF domains. Our findings identify the existence of biases in previous studies and reveal that significant performance enhancements are possible through appropriate, periodic tuning. We explore how mitigation strategies may support in achieving a more stable and better performance over time by employing multiple strategies to delay performance decay.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Empirical Assessment of Machine Learning-Based Malware Detectors for Android. Empirical Software Engineering (2016).
  2. Are Your Training Datasets Yet Relevant?. In ESSoS. Springer.
  3. Androzoo: Collecting Millions of Android Apps for the Research Community. In Mining Software Repositories. ACM.
  4. H. S. Anderson and P. Roth. 2018. EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models. ArXiv e-prints (April 2018). arXiv:1804.04637 [cs.CR]
  5. Dos and don’ts of machine learning in computer security. In 31st USENIX Security Symposium (USENIX Security 22). 3971–3988.
  6. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket. In NDSS.
  7. Stefan Axelsson. 2000. The Base-Rate Fallacy and the Difficulty of Intrusion Detection. ACM TISSEC (2000).
  8. Transcending transcend: Revisiting malware classification in the presence of concept drift. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 805–823.
  9. Peter L Bartlett and Marten H Wegkamp. 2008. Classification with a reject option using a hinge loss. JMLR (2008).
  10. Battista Biggio and Fabio Roli. 2018. Wild Patterns: Ten Years After The Rise of Adversarial Machine Learning. Pattern Recognition (2018).
  11. Christopher M Bishop. 2006. Pattern Recognition and Machine Learning.
  12. Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5–32.
  13. Special Issue on Learning From Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter (2004).
  14. Continuous Learning for Android Malware Detection. In USENIX Security Symposium.
  15. Is It Overkill? Analyzing Feature-Space Concept Drift in Malware Detectors. In 2023 IEEE Deep Learning Security and Privacy Workshop (DLSP). IEEE.
  16. Drift Forensics of Malware Classifiers. In Proc. of the ACM Workshop on Artificial Intelligence and Security (AISec). ACM.
  17. ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection. In USENIX Security.
  18. Large-Scale Malware Classification Using Random Projections and Neural Networks. In Int. Conf. Acoustics, Speech and Signal Processing (ICASSP). IEEE.
  19. A deep dive inside drebin: An explorative analysis beyond android malware detection scores. ACM Transactions on Privacy and Security 25, 2 (2022), 1–28.
  20. Droidscribe: Classifying Android Malware Based on Runtime Behavior. In MoST-SPW. IEEE.
  21. Jesse Davis and Mark Goadrich. 2006. The Relationship Between Precision-Recall and ROC Curves. In Proceedings of the 23rd international conference on Machine learning. ACM, 233–240.
  22. Yes, machine learning can be more secure! a case study on android malware detection. IEEE transactions on dependable and secure computing 16, 4 (2017), 711–724.
  23. K Divya and Venkata Krishna Kumar. 2016. Comparative analysis of smart phone operating systems Android, Apple IOS and Windows. International Journal of Scientific Engineering and Applied Science (IJSEAS) 2, 2 (2016), 432–439.
  24. Jun Du and Charles X Ling. 2010. Active Learning with Human-Like Noisy Oracle. In ICDM. IEEE.
  25. Caradoc: A pragmatic approach to pdf parsing and validation. In 2016 IEEE Security and Privacy Workshops (SPW). Ieee, 126–139.
  26. Tom Fawcett. 2003. In vivo spam filtering: a challenge problem for KDD. ACM SIGKDD Explorations Newsletter (2003).
  27. A survey of mobile malware in the wild. Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices (2011). https://doi.org/10.1145/2046614.2046618
  28. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.
  29. Structural Detection of Android Malware using Embedded Call Graphs. In AISec. ACM.
  30. Deep learning. MIT press Cambridge.
  31. Google. 2004. VirusTotal. https://www.virustotal.com
  32. Google. March 2018. Android Security 2017 Year In Review. https://source.android.com/security/reports/Google_Android_Security_2017_Report_Final.pdf.
  33. Adversarial examples for malware detection. In ESORICS. Springer.
  34. A novel detection technique based on benign samples and one-class algorithm for malicious PDF documents containing JavaScript. In International Conference on Computer Application and Information Security (ICCAIS 2021), Vol. 12260. SPIE, 599–607.
  35. David J Hand. 2009. Measuring Classifier Performance: a Coherent Alternative to the Area Under the ROC Curve. Machine Learning (2009).
  36. Haibo He and Edwardo A Garcia. 2009. Learning From Imbalanced Data. IEEE TKDE (2009).
  37. Transcend: Detecting Concept Drift in Malware Classification Models. In USENIX Security.
  38. Pavel Laskov and Nedim Šrndić. 2011. Static Detection of Malicious JavaScript-Bearing PDF Documents. In ACSAC. ACM.
  39. Sangho Lee and Jong Kim. 2012. WarningBird: Detecting Suspicious URLs in Twitter Stream. In NDSS.
  40. David D Lewis and Jason Catlett. 1994. Heterogeneous uncertainty sampling for supervised learning. In Machine learning proceedings 1994. Elsevier, 148–156.
  41. MoonlightBox: Mining Android API Histories for Uncovering Release-time Inconsistencies. In Symp. on Software Reliability Engineering. IEEE.
  42. AndRadar: Fast Discovery of Android Applications in Alternative Markets. In DIMVA. Springer.
  43. Adversarial attacks against Windows PE malware detection: A survey of the state-of-the-art. Computers & Security (2023), 103134.
  44. Two Years of Short URLs Internet Measurement: Security Threats and Countermeasures. In WWW. ACM.
  45. A Pattern Recognition System for Malicious PDF Files Detection. In Intl. Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer.
  46. MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models. In NDSS.
  47. Zane Markel and Michael Bilzor. 2014. Building a Machine Learning Classifier for Malware Detection. In Anti-malware Testing Research Workshop. IEEE.
  48. Reviewer Integration and Performance Measurement for Malware Detection. In DIMVA. Springer.
  49. Bradley Austin Miller. 2015. Scalable Platform for Malicious Content Detection Integrating Machine Learning and Manual Review. University of California, Berkeley.
  50. A unifying view on dataset shift in classification. Pattern Recognition (2012).
  51. PyTorch: An Imperative Style, High-Performance Deep Learning Library. https://pytorch.org/.
  52. Scikit-Learn: Machine Learning in Python. JMLR (2011).
  53. POSTER: Enabling Fair ML Evaluations for Security. In CCS. ACM.
  54. {{\{{TESSERACT}}\}}: Eliminating experimental bias in malware classification across space and time. In 28th USENIX Security Symposium (USENIX Security 19). 729–746.
  55. Exploring the Long Tail of (Malicious) Software Downloads. In DSN. IEEE.
  56. Cujo: Efficient Detection and Prevention of Drive-By-Download Attacks. In ACSAC. ACM.
  57. Prudent Practices for Designing Malware Experiments: Status Quo and Outlook. In Symp. S&P. IEEE.
  58. Experimental Study with Real-World Data for Android App Security Analysis Using Machine Learning. In ACSAC. ACM.
  59. Ridge regression learning algorithm in dual variables. (1998).
  60. Burr Settles. 2012. Active Learning Literature Survey. Synthesis Lectures on Artificial Intelligence and Machine Learning (2012).
  61. Robin Sommer and Vern Paxson. 2010. Outside the Closed World: On Using Machine Learning for Network Intrusion Detection. In Symp. S&P. IEEE.
  62. Nedim Šrndić and Pavel Laskov. 2016. Hidost: a static machine-learning-based detector of malicious files. EURASIP Journal on Information Security 2016 (2016), 1–20.
  63. Revisiting the mobile software ecosystems literature. In 2019 IEEE/ACM 7th International Workshop on Software Engineering for Systems-of-Systems (SESoS) and 13th Workshop on Distributed Software Development, Software Ecosystems and Systems-of-Systems (WDES). IEEE, 50–57.
  64. Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages. In CCS. ACM.
  65. DroidSieve: Fast and Accurate Classification of Obfuscated Android Malware. In CODASPY. ACM.
  66. Dataset Shift in Machine Learning. The MIT Press.
  67. Mal-Id: Automatic Malware Detection Using Common Segment Analysis and Meta-features. JMLR (2012).
  68. Antonio Torralba and Alexei A Efros. 2011. Unbiased look at dataset bias. In CVPR. IEEE.
  69. Measuring and Detecting Malware Downloads in Live Network Traffic. In ESORICS. Springer.
  70. Benchmarking Crimes: An Emerging Threat in Systems Security. arXiv preprint (2018).
  71. Gary M Weiss and Foster Provost. 2003. Learning when Training Data Are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research (2003).
  72. Droidevolver: Self-evolving android malware detection system. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 47–62.
  73. Droid-sec: Deep Learning in Android Malware Detection. In SIGCOMM Computer Communication Review. ACM.
  74. Semantics-Aware Android Malware Classification Using Weighted Contextual Api Dependency Graphs. In CCS. ACM.
  75. Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security. 757–770.
  76. Hardware Performance Counters Can Detect Malware: Myth or Fact?. In ASIACCS. ACM.
  77. Yajin Zhou and Xuxian Jiang. 2012. Dissecting android malware: Characterization and evolution. In 2012 IEEE symposium on security and privacy. IEEE, 95–109.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zeliang Kan (2 papers)
  2. Shae McFadden (3 papers)
  3. Daniel Arp (9 papers)
  4. Feargus Pendlebury (7 papers)
  5. Roberto Jordaney (2 papers)
  6. Johannes Kinder (8 papers)
  7. Fabio Pierazzi (15 papers)
  8. Lorenzo Cavallaro (32 papers)

Summary

We haven't generated a summary for this paper yet.