How Far Have We Gone in Binary Code Understanding Using Large Language Models (2404.09836v3)
Abstract: Binary code analysis plays a pivotal role in various software security applications, such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code, understanding binary code is challenging for reverse engineers due to the absence of semantic information. Therefore, automated tools are needed to assist human players in interpreting binary code. In recent years, two groups of technologies have shown promising prospects: (1) Deep learning-based technologies have demonstrated competitive results in tasks related to binary code understanding, furthermore, (2) LLMs have been extensively pre-trained at the source-code level for tasks such as code understanding and generation. This makes participants wonder about the ability of LLMs in binary code understanding. In this work, we propose a benchmark to evaluate the effectiveness of LLMs in real-world reverse engineering scenarios. The benchmark covers two key binary code understanding tasks, including function name recovery and binary code summarization. We gain valuable insights into their capabilities and limitations through extensive evaluations of popular LLMs using our benchmark. Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis. Our results highlight the great potential of the LLMs in advancing the field of binary code understanding.
- G. Canfora, M. Di Penta, and L. Cerulo, “Achievements and challenges in software reverse engineering,” Commun. ACM, vol. 54, no. 4, p. 142–151, apr 2011. [Online]. Available: https://doi.org/10.1145/1924421.1924451
- J. T. Giffin, S. Jha, and B. P. Miller, “Efficient context-sensitive intrusion detection,” in Network and Distributed System Security Symposium, 2004.
- S. Alrabaee, M. Debbabi, and L. Wang, “A survey of binary code fingerprinting approaches: Taxonomy, methodologies, and features,” ACM Comput. Surv., vol. 55, no. 1, jan 2022. [Online]. Available: https://doi.org/10.1145/3486860
- Z. Zhang, W. You, G. Tao, Y. Aafer, X. Liu, and X. Zhang, “Stochfuzz: Sound and cost-effective fuzzing of stripped binaries by incremental and stochastic rewriting,” in 2021 IEEE Symposium on Security and Privacy (SP), 2021, pp. 659–676.
- X. Meng and B. P. Miller, “Binary code is not easy,” in Proceedings of the 25th International Symposium on Software Testing and Analysis, ser. ISSTA 2016. New York, NY, USA: Association for Computing Machinery, 2016, p. 24–35. [Online]. Available: https://doi.org/10.1145/2931037.2931047
- J. Patrick-Evans, L. Cavallaro, and J. Kinder, “Probabilistic naming of functions in stripped binaries,” in Proceedings of the 36th Annual Computer Security Applications Conference, ser. ACSAC ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 373–385. [Online]. Available: https://doi.org/10.1145/3427228.3427265
- Hex-RaysSA, “"ida pro",” https://www.hex-rays.com/products/ida, 2023.
- NationalSecurityAgency, “"ghidra",” https://github.com/NationalSecurityAgency/ghidra, 2023.
- Vector35, “"binary ninja",” https://binary.ninja/, 2023.
- E. M. Gellenbeck and C. R. Cook, “An investigation of procedure and variable names as beacons during program comprehension,” USA, Tech. Rep., 1991.
- H. Gao, S. Cheng, Y. Xue, and W. Zhang, “A lightweight framework for function name reassignment based on large-scale stripped binaries,” in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2021. New York, NY, USA: Association for Computing Machinery, 2021, p. 607–619. [Online]. Available: https://doi.org/10.1145/3460319.3464804
- Y. David, U. Alon, and E. Yahav, “Neural reverse engineering of stripped binaries using augmented control flow graphs,” Proc. ACM Program. Lang., vol. 4, no. OOPSLA, nov 2020. [Online]. Available: https://doi.org/10.1145/3428293
- X. Jin, K. Pei, J. Y. Won, and Z. Lin, “Symlm: Predicting function names in stripped binaries via context-sensitive execution-aware code embeddings,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 1631–1645. [Online]. Available: https://doi.org/10.1145/3548606.3560612
- G. Chen, H. Gao, J. Zhang, Y. He, S. Cheng, and W. Zhang, “Investigating neural-based function name reassignment from the perspective of binary code representation,” in 2023 20th Annual International Conference on Privacy, Security and Trust (PST), 2023, pp. 1–11.
- G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker, “Towards automatically generating summary comments for java methods,” in Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’10. New York, NY, USA: Association for Computing Machinery, 2010, p. 43–52. [Online]. Available: https://doi.org/10.1145/1858996.1859006
- A. Al-Kaswan, T. Ahmed, M. Izadi, A. A. Sawant, P. Devanbu, and A. van Deursen, “Extending source code pre-trained language models to summarise decompiled binaries,” in 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2023, pp. 260–271.
- Y. Wang, W. Wang, S. Joty, and S. C. Hoi, “CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, nov 2021, pp. 8696–8708. [Online]. Available: https://aclanthology.org/2021.emnlp-main.685
- J. Xiong, G. Chen, K. Chen, H. Gao, S. Cheng, and W. Zhang, “Hext5: Unified pre-training for stripped binary code information inference,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023, pp. 774–786.
- H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, and Y. B. et.al., “Llama 2: Open foundation and fine-tuned chat models,” 2023.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, and C. Z. et.al., “Training language models to follow instructions with human feedback,” 2022.
- B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, and J. L. et.al., “Code llama: Open foundation models for code,” 2023.
- Z. Luo, C. Xu, P. Zhao, Q. Sun, X. Geng, W. Hu, C. Tao, J. Ma, Q. Lin, and D. Jiang, “Wizardcoder: Empowering code large language models with evol-instruct,” 2023.
- Y. Wu, N. Jiang, H. V. Pham, T. Lutellier, J. Davis, L. Tan, P. Babkin, and S. Shah, “How effective are neural networks for fixing security vulnerabilities,” in Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023. New York, NY, USA: Association for Computing Machinery, 2023, p. 1282–1294. [Online]. Available: https://doi.org/10.1145/3597926.3598135
- Y. Zhang, W. Song, Z. Ji, Danfeng, Yao, and N. Meng, “How well does llm generate security tests?” 2023.
- E. Nijkamp, H. Hayashi, C. Xiong, S. Savarese, and Y. Zhou, “Codegen2: Lessons for training llms on programming and natural languages,” 2023.
- D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y. Wu, Y. K. Li, F. Luo, Y. Xiong, and W. Liang, “Deepseek-coder: When the large language model meets programming – the rise of code intelligence,” 2024.
- A. Zeng, X. Liu, Z. Du, Z. Wang, H. Lai, M. Ding, and Z. Y. et.al., “Glm-130b: An open bilingual pre-trained model,” 2023.
- L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,” 2023.
- A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. de las Casas, E. B. Hanna, F. Bressand, G. Lengyel, G. Bour, G. Lample, L. R. Lavaud, L. Saulnier, M.-A. Lachaux, P. Stock, S. Subramanian, S. Yang, S. Antoniak, T. L. Scao, T. Gervet, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed, “Mixtral of experts,” 2024.
- K. Pei, Z. Xuan, J. Yang, S. Jana, and B. Ray, “Trex: Learning execution semantics from micro-traces for binary similarity,” 2021.
- X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engineering: A systematic literature review,” 2023.
- Y. Zhang, “Leveraging artificial intelligence on binary code comprehension,” in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’22. New York, NY, USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3551349.3559564
- FFmpeg, 2024. [Online]. Available: https://github.com/FFmpeg/FFmpeg
- Redis, 2024. [Online]. Available: https://github.com/redis/redis
- Curl, 2024. [Online]. Available: https://github.com/curl/curl
- Masscan, 2024. [Online]. Available: https://github.com/robertdavidgraham/masscan
- Llama2.c, 2024. [Online]. Available: https://github.com/karpathy/llama2.c
- Whisper.cpp, 2024. [Online]. Available: https://github.com/ggerganov/whisper.cpp
- OpenSSL, 2024. [Online]. Available: https://github.com/openssl/openssl
- zstd, 2024. [Online]. Available: https://github.com/facebook/zstd
- ImageMagick, 2024. [Online]. Available: https://github.com/ImageMagick/ImageMagick
- Libvips, 2024. [Online]. Available: https://github.com/libvips/libvips
- Libexpat, 2024. [Online]. Available: https://github.com/libexpat/libexpat
- Ultrajson, 2024. [Online]. Available: https://github.com/ultrajson/ultrajson
- I. U. International, “Dwarf debugging information format version 4,” https://dwarfstd.org/doc/DWARF4.pdf, 2010.
- J. I. Maletic and M. L. Collard, “Exploration, analysis, and manipulation of source code using srcml,” in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2, 2015, pp. 951–952.
- J. Dagdelen, A. Dunn, S. Lee, N. Walker, A. S. Rosen, G. Ceder, K. A. Persson, and A. Jain, “Structured information extraction from scientific text with large language models,” Nature Communications, vol. 15, no. 1, p. 1418, 2024. [Online]. Available: https://doi.org/10.1038/s41467-024-45563-x
- D. Bzdok, A. Thieme, O. Levkovskyy, P. Wren, T. Ray, and S. Reddy, “Data science opportunities of large language models for neuroscience and biomedicine,” Neuron, vol. 112, no. 5, pp. 698–717, Mar 2024. [Online]. Available: https://doi.org/10.1016/j.neuron.2024.01.016
- Z. Tan, A. Beigi, S. Wang, R. Guo, A. Bhattacharjee, B. Jiang, M. Karami, J. Li, L. Cheng, and H. Liu, “Large language models for data annotation: A survey,” 2024.
- HuggingFace, 2024. [Online]. Available: https://huggingface.co/
- B. Chen, Z. Zhang, N. Langrené, and S. Zhu, “Unleashing the potential of prompt engineering in large language models: a comprehensive review,” 2023.
- A. Kong, S. Zhao, H. Chen, Q. Li, Y. Qin, R. Sun, X. Zhou, E. Wang, and X. Dong, “Better zero-shot reasoning with role-play prompting,” 2024.
- Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “Codebert: A pre-trained model for programming and natural languages,” 2020.
- D. Fried, A. Aghajanyan, J. Lin, S. Wang, E. Wallace, F. Shi, R. Zhong, W. tau Yih, L. Zettlemoyer, and M. Lewis, “Incoder: A generative model for code infilling and synthesis,” 2023.
- H. Wang, W. Qu, G. Katz, W. Zhu, Z. Gao, H. Qiu, J. Zhuge, and C. Zhang, “jtrans: jump-aware transformer for binary code similarity detection,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2022. New York, NY, USA: Association for Computing Machinery, 2022, p. 1–13. [Online]. Available: https://doi.org/10.1145/3533767.3534367
- X. Li, Y. Qu, and H. Yin, “Palmtree: Learning an assembly language model for instruction embedding,” in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 3236–3251. [Online]. Available: https://doi.org/10.1145/3460120.3484587
- D. Kim, E. Kim, S. K. Cha, S. Son, and Y. Kim, “Revisiting binary code similarity analysis using interpretable feature engineering and lessons learned,” IEEE Transactions on Software Engineering, vol. 49, no. 4, pp. 1661–1682, 2023.
- PyTorch, 2024. [Online]. Available: https://pytorch.org/
- DeepSpeed, 2024. [Online]. Available: https://www.deepspeed.ai/
- Transformers, 2024. [Online]. Available: https://huggingface.co/
- Y. Zheng, R. Zhang, J. Zhang, Y. Ye, Z. Luo, and Y. Ma, “Llamafactory: Unified efficient fine-tuning of 100+ language models,” 2024.
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021.
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, 2002, pp. 311–318. [Online]. Available: https://aclanthology.org/P02-1040
- A. Lavie and M. J. Denkowski, “The meteor metric for automatic evaluation of machine translation,” Machine Translation, vol. 23, no. 2–3, p. 105–115, sep 2009. [Online]. Available: https://doi.org/10.1007/s10590-009-9059-4
- C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out. Barcelona, Spain: Association for Computational Linguistics, 2004, pp. 74–81. [Online]. Available: https://aclanthology.org/W04-1013
- P. Junod, J. Rinaldini, J. Wehrli, and J. Michielin, “Obfuscator-llvm – software protection for the masses,” in Proceedings of the 2015 IEEE/ACM 1st International Workshop on Software Protection, ser. SPRO ’15. USA: IEEE Computer Society, 2015, p. 3–9. [Online]. Available: https://doi.org/10.1109/SPRO.2015.10
- Xiuwei Shang (10 papers)
- Shaoyin Cheng (7 papers)
- Guoqiang Chen (11 papers)
- Yanming Zhang (15 papers)
- Li Hu (27 papers)
- Xiao Yu (66 papers)
- Gangyang Li (5 papers)
- Weiming Zhang (135 papers)
- Nenghai Yu (173 papers)