Adversarial Binaries for Authorship Identification

Published 21 Sep 2018 in cs.CR | (1809.08316v2)

Abstract: Binary code authorship identification determines authors of a binary program. Existing techniques have used supervised machine learning for this task. In this paper, we look this problem from an attacker's perspective. We aim to modify a test binary, such that it not only causes misprediction but also maintains the functionality of the original input binary. Attacks against binary code are intrinsically more difficult than attacks against domains such as computer vision, where attackers can change each pixel of the input image independently and still maintain a valid image. For binary code, even flipping one bit of a binary may cause the binary to be invalid, to crash at the run-time, or to lose the original functionality. We investigate two types of attacks: untargeted attacks, causing misprediction to any of the incorrect authors, and targeted attacks, causing misprediction to a specific one among the incorrect authors. We develop two key attack capabilities: feature vector modification, generating an adversarial feature vector that both corresponds to a real binary and causes the required misprediction, and input binary modification, modifying the input binary to match the adversarial feature vector while maintaining the functionality of the input binary. We evaluated our attack against classifiers trained with a state-of-the-art method for authorship attribution. The classifiers for authorship identification have 91% accuracy on average. Our untargeted attack has a 96% success rate on average, showing that we can effectively suppress authorship signal. Our targeted attack has a 46% success rate on average, showing that it is possible, but significantly more difficult to impersonate a specific programmer's style. Our attack reveals that existing binary code authorship identification techniques rely on code features that are easy to modify, and thus are vulnerable to attacks.