Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Language Binary-Source Code Matching with Intermediate Representations (2201.07420v1)

Published 19 Jan 2022 in cs.SE and cs.AI

Abstract: Binary-source code matching plays an important role in many security and software engineering related tasks such as malware detection, reverse engineering and vulnerability assessment. Currently, several approaches have been proposed for binary-source code matching by jointly learning the embeddings of binary code and source code in a common vector space. Despite much effort, existing approaches target on matching the binary code and source code written in a single programming language. However, in practice, software applications are often written in different programming languages to cater for different requirements and computing platforms. Matching binary and source code across programming languages introduces additional challenges when maintaining multi-language and multi-platform applications. To this end, this paper formulates the problem of cross-language binary-source code matching, and develops a new dataset for this new problem. We present a novel approach XLIR, which is a Transformer-based neural network by learning the intermediate representations for both binary and source code. To validate the effectiveness of XLIR, comprehensive experiments are conducted on two tasks of cross-language binary-source code matching, and cross-language source-source code matching, on top of our curated dataset. Experimental results and analysis show that our proposed XLIR with intermediate representations significantly outperforms other state-of-the-art models in both of the two tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yi Gui (7 papers)
  2. Yao Wan (70 papers)
  3. Hongyu Zhang (147 papers)
  4. Huifang Huang (5 papers)
  5. Yulei Sui (29 papers)
  6. Guandong Xu (93 papers)
  7. Zhiyuan Shao (4 papers)
  8. Hai Jin (83 papers)
Citations (26)

Summary

We haven't generated a summary for this paper yet.