Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Far Have We Gone in Binary Code Understanding Using Large Language Models (2404.09836v3)

Published 15 Apr 2024 in cs.SE and cs.CR

Abstract: Binary code analysis plays a pivotal role in various software security applications, such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code, understanding binary code is challenging for reverse engineers due to the absence of semantic information. Therefore, automated tools are needed to assist human players in interpreting binary code. In recent years, two groups of technologies have shown promising prospects: (1) Deep learning-based technologies have demonstrated competitive results in tasks related to binary code understanding, furthermore, (2) LLMs have been extensively pre-trained at the source-code level for tasks such as code understanding and generation. This makes participants wonder about the ability of LLMs in binary code understanding. In this work, we propose a benchmark to evaluate the effectiveness of LLMs in real-world reverse engineering scenarios. The benchmark covers two key binary code understanding tasks, including function name recovery and binary code summarization. We gain valuable insights into their capabilities and limitations through extensive evaluations of popular LLMs using our benchmark. Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis. Our results highlight the great potential of the LLMs in advancing the field of binary code understanding.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. G. Canfora, M. Di Penta, and L. Cerulo, “Achievements and challenges in software reverse engineering,” Commun. ACM, vol. 54, no. 4, p. 142–151, apr 2011. [Online]. Available: https://doi.org/10.1145/1924421.1924451
  2. J. T. Giffin, S. Jha, and B. P. Miller, “Efficient context-sensitive intrusion detection,” in Network and Distributed System Security Symposium, 2004.
  3. S. Alrabaee, M. Debbabi, and L. Wang, “A survey of binary code fingerprinting approaches: Taxonomy, methodologies, and features,” ACM Comput. Surv., vol. 55, no. 1, jan 2022. [Online]. Available: https://doi.org/10.1145/3486860
  4. Z. Zhang, W. You, G. Tao, Y. Aafer, X. Liu, and X. Zhang, “Stochfuzz: Sound and cost-effective fuzzing of stripped binaries by incremental and stochastic rewriting,” in 2021 IEEE Symposium on Security and Privacy (SP), 2021, pp. 659–676.
  5. X. Meng and B. P. Miller, “Binary code is not easy,” in Proceedings of the 25th International Symposium on Software Testing and Analysis, ser. ISSTA 2016.   New York, NY, USA: Association for Computing Machinery, 2016, p. 24–35. [Online]. Available: https://doi.org/10.1145/2931037.2931047
  6. J. Patrick-Evans, L. Cavallaro, and J. Kinder, “Probabilistic naming of functions in stripped binaries,” in Proceedings of the 36th Annual Computer Security Applications Conference, ser. ACSAC ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 373–385. [Online]. Available: https://doi.org/10.1145/3427228.3427265
  7. Hex-RaysSA, “"ida pro",” https://www.hex-rays.com/products/ida, 2023.
  8. NationalSecurityAgency, “"ghidra",” https://github.com/NationalSecurityAgency/ghidra, 2023.
  9. Vector35, “"binary ninja",” https://binary.ninja/, 2023.
  10. E. M. Gellenbeck and C. R. Cook, “An investigation of procedure and variable names as beacons during program comprehension,” USA, Tech. Rep., 1991.
  11. H. Gao, S. Cheng, Y. Xue, and W. Zhang, “A lightweight framework for function name reassignment based on large-scale stripped binaries,” in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2021.   New York, NY, USA: Association for Computing Machinery, 2021, p. 607–619. [Online]. Available: https://doi.org/10.1145/3460319.3464804
  12. Y. David, U. Alon, and E. Yahav, “Neural reverse engineering of stripped binaries using augmented control flow graphs,” Proc. ACM Program. Lang., vol. 4, no. OOPSLA, nov 2020. [Online]. Available: https://doi.org/10.1145/3428293
  13. X. Jin, K. Pei, J. Y. Won, and Z. Lin, “Symlm: Predicting function names in stripped binaries via context-sensitive execution-aware code embeddings,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 1631–1645. [Online]. Available: https://doi.org/10.1145/3548606.3560612
  14. G. Chen, H. Gao, J. Zhang, Y. He, S. Cheng, and W. Zhang, “Investigating neural-based function name reassignment from the perspective of binary code representation,” in 2023 20th Annual International Conference on Privacy, Security and Trust (PST), 2023, pp. 1–11.
  15. G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker, “Towards automatically generating summary comments for java methods,” in Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’10.   New York, NY, USA: Association for Computing Machinery, 2010, p. 43–52. [Online]. Available: https://doi.org/10.1145/1858996.1859006
  16. A. Al-Kaswan, T. Ahmed, M. Izadi, A. A. Sawant, P. Devanbu, and A. van Deursen, “Extending source code pre-trained language models to summarise decompiled binaries,” in 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2023, pp. 260–271.
  17. Y. Wang, W. Wang, S. Joty, and S. C. Hoi, “CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.   Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, nov 2021, pp. 8696–8708. [Online]. Available: https://aclanthology.org/2021.emnlp-main.685
  18. J. Xiong, G. Chen, K. Chen, H. Gao, S. Cheng, and W. Zhang, “Hext5: Unified pre-training for stripped binary code information inference,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023, pp. 774–786.
  19. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, and Y. B. et.al., “Llama 2: Open foundation and fine-tuned chat models,” 2023.
  20. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, and C. Z. et.al., “Training language models to follow instructions with human feedback,” 2022.
  21. B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, and J. L. et.al., “Code llama: Open foundation models for code,” 2023.
  22. Z. Luo, C. Xu, P. Zhao, Q. Sun, X. Geng, W. Hu, C. Tao, J. Ma, Q. Lin, and D. Jiang, “Wizardcoder: Empowering code large language models with evol-instruct,” 2023.
  23. Y. Wu, N. Jiang, H. V. Pham, T. Lutellier, J. Davis, L. Tan, P. Babkin, and S. Shah, “How effective are neural networks for fixing security vulnerabilities,” in Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023.   New York, NY, USA: Association for Computing Machinery, 2023, p. 1282–1294. [Online]. Available: https://doi.org/10.1145/3597926.3598135
  24. Y. Zhang, W. Song, Z. Ji, Danfeng, Yao, and N. Meng, “How well does llm generate security tests?” 2023.
  25. E. Nijkamp, H. Hayashi, C. Xiong, S. Savarese, and Y. Zhou, “Codegen2: Lessons for training llms on programming and natural languages,” 2023.
  26. D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y. Wu, Y. K. Li, F. Luo, Y. Xiong, and W. Liang, “Deepseek-coder: When the large language model meets programming – the rise of code intelligence,” 2024.
  27. A. Zeng, X. Liu, Z. Du, Z. Wang, H. Lai, M. Ding, and Z. Y. et.al., “Glm-130b: An open bilingual pre-trained model,” 2023.
  28. L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,” 2023.
  29. A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. de las Casas, E. B. Hanna, F. Bressand, G. Lengyel, G. Bour, G. Lample, L. R. Lavaud, L. Saulnier, M.-A. Lachaux, P. Stock, S. Subramanian, S. Yang, S. Antoniak, T. L. Scao, T. Gervet, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed, “Mixtral of experts,” 2024.
  30. K. Pei, Z. Xuan, J. Yang, S. Jana, and B. Ray, “Trex: Learning execution semantics from micro-traces for binary similarity,” 2021.
  31. X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engineering: A systematic literature review,” 2023.
  32. Y. Zhang, “Leveraging artificial intelligence on binary code comprehension,” in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’22.   New York, NY, USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3551349.3559564
  33. FFmpeg, 2024. [Online]. Available: https://github.com/FFmpeg/FFmpeg
  34. Redis, 2024. [Online]. Available: https://github.com/redis/redis
  35. Curl, 2024. [Online]. Available: https://github.com/curl/curl
  36. Masscan, 2024. [Online]. Available: https://github.com/robertdavidgraham/masscan
  37. Llama2.c, 2024. [Online]. Available: https://github.com/karpathy/llama2.c
  38. Whisper.cpp, 2024. [Online]. Available: https://github.com/ggerganov/whisper.cpp
  39. OpenSSL, 2024. [Online]. Available: https://github.com/openssl/openssl
  40. zstd, 2024. [Online]. Available: https://github.com/facebook/zstd
  41. ImageMagick, 2024. [Online]. Available: https://github.com/ImageMagick/ImageMagick
  42. Libvips, 2024. [Online]. Available: https://github.com/libvips/libvips
  43. Libexpat, 2024. [Online]. Available: https://github.com/libexpat/libexpat
  44. Ultrajson, 2024. [Online]. Available: https://github.com/ultrajson/ultrajson
  45. I. U. International, “Dwarf debugging information format version 4,” https://dwarfstd.org/doc/DWARF4.pdf, 2010.
  46. J. I. Maletic and M. L. Collard, “Exploration, analysis, and manipulation of source code using srcml,” in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2, 2015, pp. 951–952.
  47. J. Dagdelen, A. Dunn, S. Lee, N. Walker, A. S. Rosen, G. Ceder, K. A. Persson, and A. Jain, “Structured information extraction from scientific text with large language models,” Nature Communications, vol. 15, no. 1, p. 1418, 2024. [Online]. Available: https://doi.org/10.1038/s41467-024-45563-x
  48. D. Bzdok, A. Thieme, O. Levkovskyy, P. Wren, T. Ray, and S. Reddy, “Data science opportunities of large language models for neuroscience and biomedicine,” Neuron, vol. 112, no. 5, pp. 698–717, Mar 2024. [Online]. Available: https://doi.org/10.1016/j.neuron.2024.01.016
  49. Z. Tan, A. Beigi, S. Wang, R. Guo, A. Bhattacharjee, B. Jiang, M. Karami, J. Li, L. Cheng, and H. Liu, “Large language models for data annotation: A survey,” 2024.
  50. HuggingFace, 2024. [Online]. Available: https://huggingface.co/
  51. B. Chen, Z. Zhang, N. Langrené, and S. Zhu, “Unleashing the potential of prompt engineering in large language models: a comprehensive review,” 2023.
  52. A. Kong, S. Zhao, H. Chen, Q. Li, Y. Qin, R. Sun, X. Zhou, E. Wang, and X. Dong, “Better zero-shot reasoning with role-play prompting,” 2024.
  53. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “Codebert: A pre-trained model for programming and natural languages,” 2020.
  54. D. Fried, A. Aghajanyan, J. Lin, S. Wang, E. Wallace, F. Shi, R. Zhong, W. tau Yih, L. Zettlemoyer, and M. Lewis, “Incoder: A generative model for code infilling and synthesis,” 2023.
  55. H. Wang, W. Qu, G. Katz, W. Zhu, Z. Gao, H. Qiu, J. Zhuge, and C. Zhang, “jtrans: jump-aware transformer for binary code similarity detection,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2022.   New York, NY, USA: Association for Computing Machinery, 2022, p. 1–13. [Online]. Available: https://doi.org/10.1145/3533767.3534367
  56. X. Li, Y. Qu, and H. Yin, “Palmtree: Learning an assembly language model for instruction embedding,” in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 3236–3251. [Online]. Available: https://doi.org/10.1145/3460120.3484587
  57. D. Kim, E. Kim, S. K. Cha, S. Son, and Y. Kim, “Revisiting binary code similarity analysis using interpretable feature engineering and lessons learned,” IEEE Transactions on Software Engineering, vol. 49, no. 4, pp. 1661–1682, 2023.
  58. PyTorch, 2024. [Online]. Available: https://pytorch.org/
  59. DeepSpeed, 2024. [Online]. Available: https://www.deepspeed.ai/
  60. Transformers, 2024. [Online]. Available: https://huggingface.co/
  61. Y. Zheng, R. Zhang, J. Zhang, Y. Ye, Z. Luo, and Y. Ma, “Llamafactory: Unified efficient fine-tuning of 100+ language models,” 2024.
  62. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021.
  63. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.   Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, 2002, pp. 311–318. [Online]. Available: https://aclanthology.org/P02-1040
  64. A. Lavie and M. J. Denkowski, “The meteor metric for automatic evaluation of machine translation,” Machine Translation, vol. 23, no. 2–3, p. 105–115, sep 2009. [Online]. Available: https://doi.org/10.1007/s10590-009-9059-4
  65. C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out.   Barcelona, Spain: Association for Computational Linguistics, 2004, pp. 74–81. [Online]. Available: https://aclanthology.org/W04-1013
  66. P. Junod, J. Rinaldini, J. Wehrli, and J. Michielin, “Obfuscator-llvm – software protection for the masses,” in Proceedings of the 2015 IEEE/ACM 1st International Workshop on Software Protection, ser. SPRO ’15.   USA: IEEE Computer Society, 2015, p. 3–9. [Online]. Available: https://doi.org/10.1109/SPRO.2015.10
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Xiuwei Shang (10 papers)
  2. Shaoyin Cheng (7 papers)
  3. Guoqiang Chen (11 papers)
  4. Yanming Zhang (15 papers)
  5. Li Hu (27 papers)
  6. Xiao Yu (66 papers)
  7. Gangyang Li (5 papers)
  8. Weiming Zhang (135 papers)
  9. Nenghai Yu (173 papers)
Citations (1)