Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models (2404.15681v2)
Abstract: Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. In this study the ability of GPT models to generate novel and correct versions, and notably very insecure versions, of implementations of the cryptographic hash function SHA-1 is examined. The GPT models Llama-2-70b-chat-h, Mistral-7B-Instruct-v0.1, and zephyr-7b-alpha are used. The GPT models are prompted to re-write each function using a modified version of the localGPT framework and langchain to provide word embedding context of the full source code and header files to the model, resulting in over 150,000 function re-write GPT output text blocks, approximately 50,000 of which were able to be parsed as C code and subsequently compiled. The generated code is analyzed for being compilable, correctness of the algorithm, memory leaks, compiler optimization stability, and character distance to the reference implementation. Remarkably, several generated function variants have a high implementation security risk of being correct for some test vectors, but incorrect for other test vectors. Additionally, many function implementations were not correct to the reference algorithm of SHA-1, but produced hashes that have some of the basic characteristics of hash functions. Many of the function re-writes contained serious flaws such as memory leaks, integer overflows, out of bounds accesses, use of uninitialised values, and compiler optimization instability. Compiler optimization settings and SHA-256 hash checksums of the compiled binaries are used to cluster implementations that are equivalent but may not have identical syntax - using this clustering over 100,000 novel and correct versions of the SHA-1 codebase were generated where each component C function of the reference implementation is different from the original code.
- “Evaluating Large Language Models Trained on Code”, 2021 arXiv:2107.03374 [cs.LG]
- “Attention Is All You Need”, 2023 arXiv:1706.03762 [cs.CL]
- “Llama 2: Open Foundation and Fine-Tuned Chat Models”, 2023 arXiv:2307.09288 [cs.CL]
- “Language Models are Few-Shot Learners”, 2020 arXiv:2005.14165 [cs.CL]
- “GPT-4 Technical Report”, 2023 arXiv:2303.08774 [cs.CL]
- “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”, 2022 arXiv:2205.14135 [cs.LG]
- “RoFormer: Enhanced Transformer with Rotary Position Embedding”, 2023 arXiv:2104.09864 [cs.CL]
- Noam Shazeer “Fast Transformer Decoding: One Write-Head is All You Need”, 2019 arXiv:1911.02150 [cs.NE]
- “Secure Hash Standard”, 2002 URL: https://csrc.nist.gov/pubs/fips/180-2/final
- “A Vulnerability in Implementations of SHA-3, SHAKE, EdDSA, and Other NIST-Approved Algorithms” https://eprint.iacr.org/2023/331, Cryptology ePrint Archive, Paper 2023/331, 2023 DOI: 10.1007/978-3-031-30872-7˙1
- “Implementation Vulnerability Analysis: A case study on ChaCha of SPHINCS” In 2020 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), 2020, pp. 97–102 DOI: 10.1109/iSES50453.2020.00032
- “Exploiting Vulnerabilities in Cryptographic Hash Functions Based on Reconfigurable Hardware” In IEEE Transactions on Information Forensics and Security 8.5, 2013, pp. 810–820 DOI: 10.1109/TIFS.2013.2256898
- “Finding Bugs in Cryptographic Hash Function Implementations” In IEEE Transactions on Reliability 67.3, 2018, pp. 870–884 DOI: 10.1109/TR.2018.2847247
- “The first collision for full SHA-1” In Advances in Cryptology–CRYPTO 2017: 37th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 20–24, 2017, Proceedings, Part I 37, 2017, pp. 570–596 Springer
- Xiaoyun Wang, Yiqun Lisa Yin and Hongbo Yu “Finding collisions in the full SHA-1” In Advances in Cryptology–CRYPTO 2005: 25th Annual International Cryptology Conference, Santa Barbara, California, USA, August 14-18, 2005. Proceedings 25, 2005, pp. 17–36 Springer
- “Collisions of SHA-0 and Reduced SHA-1” In Advances in Cryptology–EUROCRYPT 2005: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Aarhus, Denmark, May 22-26, 2005. Proceedings 24, 2005, pp. 36–57 Springer
- Pierre Karpman, Thomas Peyrin and Marc Stevens “Practical free-start collision attacks on 76-step SHA-1” In Annual Cryptology Conference, 2015, pp. 623–642 Springer
- Marc Stevens “New collision attacks on SHA-1 based on optimal joint local-collision analysis” In Advances in Cryptology–EUROCRYPT 2013: 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Athens, Greece, May 26-30, 2013. Proceedings 32, 2013, pp. 245–261 Springer
- “SHA-1 is a Shambles: First Chosen-Prefix Collision on SHA-1 and Application to the PGP Web of Trust” In 29th USENIX Security Symposium (USENIX Security 20) USENIX Association, 2020, pp. 1839–1856 URL: https://www.usenix.org/conference/usenixsecurity20/presentation/leurent
- “On the infeasibility of modeling polymorphic shellcode” In Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS ’07 Alexandria, Virginia, USA: Association for Computing Machinery, 2007, pp. 541–551 DOI: 10.1145/1315245.1315312
- “clang: a C language family frontend for LLVM” https://clang.llvm.org
- “LLVM: a compilation framework for lifelong program analysis & transformation” In International Symposium on Code Generation and Optimization, 2004. CGO 2004., 2004, pp. 75–86 DOI: 10.1109/CGO.2004.1281665
- Chris Lattner “LLVM and Clang: Next generation compiler technology” In The BSD conference 5, 2008, pp. 1–20
- “AddressSanitizer: A Fast Address Sanity Checker” In USENIX ATC 2012, 2012 URL: https://www.usenix.org/conference/usenixfederatedconferencesweek/addresssanitizer-fast-address-sanity-checker
- “Valgrind: A program supervision framework” In Electronic notes in theoretical computer science 89.2 Elsevier, 2003, pp. 44–66 DOI: 10.1016/S1571-0661(04)81042-9
- Nicholas Nethercote “Dynamic binary analysis and instrumentation”, 2004 DOI: 10.48456/tr-606
- “Valgrind: a framework for heavyweight dynamic binary instrumentation” In SIGPLAN Not. 42.6 New York, NY, USA: Association for Computing Machinery, 2007, pp. 89–100 DOI: 10.1145/1273442.1250746
- “How to shadow every byte of memory used by a program” In Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE ’07 San Diego, California, USA: Association for Computing Machinery, 2007, pp. 65–74 DOI: 10.1145/1254810.1254820
- “Using Valgrind to Detect Undefined Value Errors with Bit-Precision.” In USENIX Annual Technical Conference, General Track, 2005, pp. 17–30
- “Novel Approach to Cryptography Implementation using ChatGPT” https://eprint.iacr.org/2023/606, Cryptology ePrint Archive, Paper 2023/606, 2023 URL: https://eprint.iacr.org/2023/606
- “Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation”, 2023 arXiv:2309.07103 [cs.SE]
- “IntelliCode compose: code generation using transformer” In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020 Virtual Event, USA: Association for Computing Machinery, 2020, pp. 1433–1443 DOI: 10.1145/3368089.3417058
- Aishwarya Narasimhan, Krishna Prasad Agara Venkatesha Rao and Veena M B “CGEMs: A Metric Model for Automatic Code Generation using GPT-3”, 2021 arXiv:2108.10168 [cs.AI]
- Luis Perez, Lizi Ottens and Sudharshan Viswanathan “Automatic Code Generation using Pre-Trained Language Models”, 2021 arXiv:2102.10535 [cs.CL]
- “VeriGen: A Large Language Model for Verilog Code Generation” Just Accepted In ACM Trans. Des. Autom. Electron. Syst. New York, NY, USA: Association for Computing Machinery, 2024 DOI: 10.1145/3643681
- Immanuel Trummer “CodexDB: synthesizing code for query processing from natural language instructions using GPT-3 codex” In Proc. VLDB Endow. 15.11 VLDB Endowment, 2022, pp. 2921–2928 DOI: 10.14778/3551793.3551841
- “Large Language Model-Aware In-Context Learning for Code Generation”, 2023 arXiv:2310.09748 [cs.SE]
- “Fixing Rust Compilation Errors using LLMs”, 2023 arXiv:2308.05177 [cs.SE]
- “Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming”, 2024 arXiv:2402.14261 [cs.SE]
- Márk Lajkó, Viktor Csuvik and László Vidács “Towards JavaScript program repair with generative pre-trained transformer (GPT-2)” In Proceedings of the Third International Workshop on Automated Program Repair, APR ’22 Pittsburgh, Pennsylvania: Association for Computing Machinery, 2022, pp. 61–68 DOI: 10.1145/3524459.3527350
- “GPT-3-Powered Type Error Debugging: Investigating the Use of Large Language Models for Code Repair” In Proceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering, SLE 2023 Cascais, Portugal: Association for Computing Machinery, 2023, pp. 111–124 DOI: 10.1145/3623476.3623522
- “Examining Zero-Shot Vulnerability Repair with Large Language Models” In 2023 IEEE Symposium on Security and Privacy (SP), 2023, pp. 2339–2356 DOI: 10.1109/SP46215.2023.10179324
- “Fine-tuning gpt-2 to patch programs, is it worth it?” In International Conference on Computational Science and Its Applications, 2022, pp. 79–91 Springer
- “Improved Program Repair Methods using Refactoring with GPT Models” In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1, SIGCSE 2024 Portland, OR, USA: Association for Computing Machinery, 2024, pp. 569–575 DOI: 10.1145/3626252.3630875
- “InferFix: End-to-End Program Repair with LLMs” In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023 San Francisco, CA, USA: Association for Computing Machinery, 2023, pp. 1646–1656 DOI: 10.1145/3611643.3613892
- “Generative Software Engineering”, 2024 arXiv:2403.02583 [cs.SE]
- Mohammed Latif Siddiq, Beatrice Casey and Joanna C. S. Santos “A Lightweight Framework for High-Quality Code Generation”, 2023 arXiv:2307.08220 [cs.SE]
- “Calibration and Correctness of Language Models for Code”, 2024 arXiv:2402.02047 [cs.SE]
- “Unit Test Case Generation with Transformers and Focal Context”, 2021 arXiv:2009.05617 [cs.SE]
- Qiuhan Gu “LLM-Based Code Generation Method for Golang Compiler Testing” In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023 New York, NY, USA: Association for Computing Machinery, 2023, pp. 2201–2203 DOI: 10.1145/3611643.3617850
- “An initial investigation of ChatGPT unit test generation capability” In Proceedings of the 8th Brazilian Symposium on Systematic and Automated Software Testing, SAST ’23 Campo Grande, MS, Brazil: Association for Computing Machinery, 2023, pp. 15–24 DOI: 10.1145/3624032.3624035
- “ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation” In IEEE Transactions on Software Engineering, 2024, pp. 1–19 DOI: 10.1109/TSE.2024.3382365
- “Prompting Code Interpreter to Write Better Unit Tests on Quixbugs Functions”, 2023 arXiv:2310.00483 [cs.SE]
- Harrison Chase “LangChain”, 2022 URL: https://github.com/langchain-ai/langchain
- “LocalGPT” https://github.com/PromtEngineer/localGPT
- “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”, 2021 arXiv:2005.11401 [cs.CL]
- “One Embedder, Any Task: Instruction-Finetuned Text Embeddings”, 2023 arXiv:2212.09741 [cs.CL]
- “NVIDIA A100 Tensor Core GPU: Performance and Innovation” In IEEE Micro 41.2, 2021, pp. 29–35 DOI: 10.1109/MM.2021.3061394
- “HuggingFace’s Transformers: State-of-the-art Natural Language Processing”, 2020 arXiv:1910.03771 [cs.CL]
- “PyTorch: An Imperative Style, High-Performance Deep Learning Library” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019, pp. 8024–8035 URL: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
- “Mistral 7B”, 2023 arXiv:2310.06825 [cs.CL]
- “Zephyr: Direct Distillation of LM Alignment”, 2023 arXiv:2310.16944 [cs.LG]
- “crypto-algorithms”, https://github.com/B-Con/crypto-algorithms
- “Preventing Vulnerabilities Caused by Optimization of Code with Undefined Behavior” In Programming and Computer Software 48.7 Springer, 2022, pp. 445–454
- “Dealing With C’s Original Sin” In IEEE Software 36.5, 2019, pp. 24–28 DOI: 10.1109/MS.2019.2921226
- Chris Hathhorn, Chucky Ellison and Grigore Roşu “Defining the undefinedness of C” In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’15 Portland, OR, USA: Association for Computing Machinery, 2015, pp. 336–345 DOI: 10.1145/2737924.2737979
- “A Differential Approach to Undefined Behavior Detection” In ACM Trans. Comput. Syst. 33.1 New York, NY, USA: Association for Computing Machinery, 2015 DOI: 10.1145/2699678
- Zefan Shen “The Impact of Undefined Behavior on Compiler Optimization” In Proceedings of the 2021 European Symposium on Software Engineering, ESSE ’21 Larissa, Greece: Association for Computing Machinery, 2022, pp. 45–50 DOI: 10.1145/3501774.3501781
- Wentao Li, Jianhua Sun and Hao Chen “Detecting Undefined Behaviors in CUDA C” In IEEE Access 7, 2019, pp. 182559–182572 DOI: 10.1109/ACCESS.2019.2954143
- “Finding Unstable Code via Compiler-Driven Differential Testing” In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023 Vancouver, BC, Canada: Association for Computing Machinery, 2023, pp. 238–251 DOI: 10.1145/3582016.3582053
- “Taming undefined behavior in LLVM” In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017 Barcelona, Spain: Association for Computing Machinery, 2017, pp. 633–647 DOI: 10.1145/3062341.3062343
- “Modeling undefined behaviour semantics for checking equivalence across compiler optimizations” In Hardware and Software: Verification and Testing: 13th International Haifa Verification Conference, HVC 2017, Haifa, Israel, November 13-15, 2017, Proceedings 13, 2017, pp. 19–34 Springer
- “Towards optimization-safe systems: analyzing the impact of undefined behavior” In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13 Farminton, Pennsylvania: Association for Computing Machinery, 2013, pp. 260–275 DOI: 10.1145/2517349.2522728
- Vijay D’Silva, Mathias Payer and Dawn Song “The Correctness-Security Gap in Compiler Optimization” In 2015 IEEE Security and Privacy Workshops, 2015, pp. 73–87 DOI: 10.1109/SPW.2015.33
- “Evaluation of Compiler-Induced Vulnerabilities” In Journal of Aerospace Information Systems 16.10, 2019, pp. 409–426 DOI: 10.2514/1.I010699
- “Application of compiler transformations against software vulnerabilities exploitation” In Programming and Computer Software 41 Springer, 2015, pp. 231–236 DOI: 10.1134/S0361768815040052
- “binocle”, https://github.com/sharkdp/binocle
- Aric A. Hagberg, Daniel A. Schult and Pieter J. Swart “Exploring Network Structure, Dynamics, and Function using NetworkX” In 7th Python in Science Conference SciPy’08, 2008, pp. 11–15 URL: https://www.osti.gov/biblio/960616
- Thomas A Caswell “matplotlib/matplotlib” DOI: 10.5281/zenodo.5194481
- J. D. Hunter “Matplotlib: A 2D graphics environment” In Computing in Science & Engineering 9.3, 2007, pp. 90–95 DOI: 10.1109/MCSE.2007.55
- “Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants”, 2023 arXiv:2208.09727 [cs.CR]
- “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions”, 2021 arXiv:2108.09293 [cs.CR]
- “Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models”, 2023 arXiv:2312.04724 [cs.CR]
- Fangzhou Wu, Xiaogeng Liu and Chaowei Xiao “DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions”, 2023 arXiv:2312.04730 [cs.CR]
- Kenan Begovic, Abdulaziz Al-Ali and Qutaibah Malluhi “Cryptographic ransomware encryption detection: Survey” In Computers & Security 132 Elsevier, 2023, pp. 103349
- Dongpeng Xu, Jiang Ming and Dinghao Wu “Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping” In 2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 921–937 DOI: 10.1109/SP.2017.56
- “A Neural Network-Based Approach for Cryptographic Function Detection in Malware” In IEEE Access 8, 2020, pp. 23506–23521 DOI: 10.1109/ACCESS.2020.2966860
- “CIS: The Crypto Intelligence System for automatic detection and localization of cryptographic functions in current malware” In 2012 7th International Conference on Malicious and Unwanted Software, 2012, pp. 46–53 DOI: 10.1109/MALWARE.2012.6461007
- “Discovering Cryptographic Algorithms in Binary Code Through Loop Enumeration” In 2017 International Conference on Software Security and Assurance (ICSSA), 2017, pp. 80–86 DOI: 10.1109/ICSSA.2017.22
- Joan Calvet, José M. Fernandez and Jean-Yves Marion “Aligot: cryptographic function identification in obfuscated binary programs” In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS ’12 Raleigh, North Carolina, USA: Association for Computing Machinery, 2012, pp. 169–182 DOI: 10.1145/2382196.2382217
- “SoK: Use of Cryptography in Malware Obfuscation” https://eprint.iacr.org/2022/1699, Cryptology ePrint Archive, Paper 2022/1699, 2022 URL: https://eprint.iacr.org/2022/1699
- Felix Leder, Peter Martini and Andre Wichmann “Finding and extracting crypto routines from malware” In 2009 IEEE 28th International Performance Computing and Communications Conference, 2009, pp. 394–401 DOI: 10.1109/PCCC.2009.5403858
- “CLFuzz: Vulnerability Detection of Cryptographic Algorithm Implementation via Semantic-aware Fuzzing” In ACM Trans. Softw. Eng. Methodol. 33.2 New York, NY, USA: Association for Computing Machinery, 2023 DOI: 10.1145/3628160
- Hoyong Jin, Dohyeon An and Taekyoung Kwon “Differential Testing of Cryptographic Libraries with Hybrid Fuzzing” In International Conference on Information Security and Cryptology, 2022, pp. 124–144 Springer
- Max Ammann, Lucca Hirschi and Steve Kremer “Dy fuzzing: formal Dolev-Yao models meet cryptographic protocol fuzz testing” In 45th IEEE Symposium on Security and Privacy, 2024
- “Large Language Models as Optimizers”, 2023 arXiv:2309.03409 [cs.LG]
- “DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines”, 2023 arXiv:2310.03714 [cs.CL]
- “Experimental Study of Fuzzy Hashing in Malware Clustering Analysis” In 8th Workshop on Cyber Security Experimentation and Test (CSET 15) Washington, D.C.: USENIX Association, 2015 URL: https://www.usenix.org/conference/cset15/workshop-program/presentation/li
- “Fuzzy-Import Hashing: A Malware Analysis Approach” In 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2020, pp. 1–8 DOI: 10.1109/FUZZ48607.2020.9177636