2000 character limit reached
Shellcode_IA32: A Dataset for Automatic Shellcode Generation (2104.13100v4)
Published 27 Apr 2021 in cs.SE and cs.CL
Abstract: We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task.