Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models for Cyber Security: A Systematic Literature Review (2405.04760v3)

Published 8 May 2024 in cs.CR and cs.AI

Abstract: The rapid advancement of LLMs has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we conduct a comprehensive review of the literature on the application of LLMs in cybersecurity (LLM4Security). By comprehensively collecting over 30K relevant papers and systematically analyzing 127 papers from top security and software engineering venues, we aim to provide a holistic view of how LLMs are being used to solve diverse problems across the cybersecurity domain. Through our analysis, we identify several key findings. First, we observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection. Second, we find that the datasets used for training and evaluating LLMs in these tasks are often limited in size and diversity, highlighting the need for more comprehensive and representative datasets. Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training. Finally, we discuss the main challenges and opportunities for future research in LLM4Security, including the need for more interpretable and explainable models, the importance of addressing data privacy and security concerns, and the potential for leveraging LLMs for proactive defense and threat hunting. Overall, our survey provides a comprehensive overview of the current state-of-the-art in LLM4Security and identifies several promising directions for future research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (249)
  1. Abdelrahman Abdallah and Adam Jatowt. 2024. Generator-Retriever-Generator Approach for Open-Domain Question Answering. arXiv:2307.11278 [cs.CL]
  2. Ehsan Aghaei and Ehab Al-Shaer. 2023. CVE-driven Attack Technique Prediction with Semantic Information Extraction and a Domain-specific Language Model. arXiv:2309.02785 [cs.CR]
  3. Automated CVE Analysis for Threat Prioritization and Impact Prediction. arXiv:2309.03040 [cs.CR]
  4. SecureBERT: A Domain-Specific Language Model for Cybersecurity. In Security and Privacy in Communication Networks, Fengjun Li, Kaitai Liang, Zhiqiang Lin, and Sokratis K. Katsikas (Eds.). Springer Nature Switzerland, Cham, 39–56.
  5. Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases. arXiv:2403.09675 [cs.CV]
  6. FLAG: Finding Line Anomalies (in code) with Generative AI. arXiv:2306.12643 [cs.CR]
  7. Fixing Hardware Security Bugs with Large Language Models. arXiv:2302.01215 [cs.CR]
  8. Unified Pre-training for Program Understanding and Generation. arXiv:2103.06333 [cs.CL]
  9. Tarek Ali and Panos Kostakos. 2023. HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs). arXiv:2309.16021 [cs.CR]
  10. CAN-BERT do it? Controller Area Network Intrusion Detection System based on BERT Language Model. In 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA). 1–8. https://doi.org/10.1109/AICCSA56895.2022.10017800
  11. Kamel Alrashedy and Abdullah Aljasser. 2024. Can LLMs Patch Security Issues? arXiv:2312.00024 [cs.CR]
  12. Ross J Anderson and Fabien AP Petitcolas. 1998. On the limits of steganography. IEEE Journal on selected areas in communications 16, 4 (1998), 474–481.
  13. M Anon. 2022. National vulnerability database. https://www.nist.gov/programs-projects/national-vulnerabilitydatabase-nvd.
  14. SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly. arXiv:2305.12520 [cs.PL]
  15. Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073 [cs.CL]
  16. Using ChatGPT as a Static Application Security Testing Tool. arXiv:2308.14434 [cs.CR]
  17. Covert Message Passing over Public Internet Platforms Using Model-Based Format-Transforming Encryption. arXiv:2110.07009 [cs.CR]
  18. Learning from Few Samples: A Survey. arXiv:2007.15484 [cs.CV]
  19. A Survey on Malware Detection with Graph Representation Learning. arXiv:2303.16004 [cs.CR]
  20. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. https://api.semanticscholar.org/CorpusID:245758737
  21. Fuzzing: Challenges and Reflections. IEEE Software 38, 3 (2021), 79–86. https://doi.org/10.1109/MS.2020.3016773
  22. Marcus Botacin. 2023. GPThreats-3: Is Automatic Malware Generation a Threat?. In 2023 IEEE Security and Privacy Workshops (SPW). 238–254. https://doi.org/10.1109/SPW59333.2023.00027
  23. On the User Perception of Security Risks of TAP Rules: A User Study. In End-User Development, Lucio Davide Spano, Albrecht Schmidt, Carmen Santoro, and Simone Stumpf (Eds.). Springer Nature Switzerland, Cham, 162–179.
  24. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
  25. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv:2303.12712 [cs.CL]
  26. LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection. arXiv:2306.17408 [cs.AI]
  27. Enrico Cambiaso and Luca Caviglione. 2023. Scamming the Scammers: Using ChatGPT to Reply Mails for Wasting Time and Resources. arXiv:2303.13521 [cs.CR]
  28. Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning? arXiv:2306.01754 [cs.CR]
  29. A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification. arXiv:2305.14752 [cs.SE]
  30. From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads. arXiv:2305.15336 [cs.CR]
  31. When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We? arXiv:2309.05520 [cs.SE]
  32. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]
  33. A Study on Advanced Persistent Threats. In Communications and Multimedia Security, Bart De Decker and André Zúquete (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 63–72.
  34. A Simple Framework for Contrastive Learning of Visual Representations. arXiv:2002.05709 [cs.LG]
  35. VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model. arXiv:2308.04662 [cs.CR]
  36. Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions. arXiv:2310.02431 [cs.HC]
  37. DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection. arXiv:2304.00409 [cs.CR]
  38. Automatic Root Cause Analysis via Large Language Models for Cloud Incidents. arXiv:2305.15778 [cs.SE]
  39. Beware of the Unexpected: Bimodal Taint Analysis. arXiv:2301.10545 [cs.SE]
  40. PaLM: Scaling Language Modeling with Pathways. arXiv:2204.02311 [cs.CL]
  41. Do you still need a manual smart contract audit? arXiv:2306.12338 [cs.CR]
  42. Gabriel de Jesus Coelho da Silva and Carlos Becker Westphall. 2024. A Survey of Large Language Models in Cybersecurity. arXiv:2402.16968 [cs.CR]
  43. MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots. In Proceedings 2024 Network and Distributed System Security Symposium (NDSS 2024). Internet Society. https://doi.org/10.14722/ndss.2024.24188
  44. PentestGPT: An LLM-empowered Automatic Penetration Testing Tool. arXiv:2308.06782 [cs.SE]
  45. Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (¡conf-loc¿, ¡city¿Seattle¡/city¿, ¡state¿WA¡/state¿, ¡country¿USA¡/country¿, ¡/conf-loc¿) (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 423–435. https://doi.org/10.1145/3597926.3598067
  46. Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT. arXiv:2304.02014 [cs.SE]
  47. Erik Derner and Kristina Batistič. 2023. Beyond the Safeguards: Exploring the Security Risks of ChatGPT. arXiv:2305.08005 [cs.CR]
  48. Hardfails: insights into software-exploitable hardware bugs. In Proceedings of the 28th USENIX Conference on Security Symposium (Santa Clara, CA, USA) (SEC’19). USENIX Association, USA, 213–230.
  49. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL]
  50. Dinil Mon Divakaran and Sai Teja Peddinti. 2024. LLMs for Cyber Security: New Opportunities. arXiv:2404.11338 [cs.CR]
  51. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping. arXiv:2002.06305 [cs.CL]
  52. A Survey on In-context Learning. arXiv:2301.00234 [cs.CL]
  53. Yali Du and Zhongxing Yu. 2023. Pre-training Code Representation with Semantic Flow Graph for Effective Bug Localization. arXiv:2308.12773 [cs.SE]
  54. On the Impossible Safety of Large AI Models. arXiv:2209.15259 [cs.LG]
  55. Automated Repair of Programs from Large Language Models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1469–1481. https://doi.org/10.1109/ICSE48619.2023.00128
  56. RepresentThemAll: A Universal Learning Representation of Bug Reports. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 602–614. https://doi.org/10.1109/ICSE48619.2023.00060
  57. Reza Fayyazi and Shanchieh Jay Yang. 2023. On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions. arXiv:2306.14062 [cs.AI]
  58. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv:2101.03961 [cs.LG]
  59. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv:2002.08155 [cs.CL]
  60. SecureFalcon: The Next Cyber Reasoning System for Cyber Security. arXiv:2307.06616 [cs.CR]
  61. Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices. arXiv:2306.14263 [cs.CR]
  62. An Introduction to Deep Reinforcement Learning. Foundations and Trends® in Machine Learning 11, 3–4 (2018), 219–354. https://doi.org/10.1561/2200000071
  63. InCoder: A Generative Model for Code Infilling and Synthesis. arXiv:2204.05999 [cs.SE]
  64. VulRepair: a T5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (¡conf-loc¿, ¡city¿Singapore¡/city¿, ¡country¿Singapore¡/country¿, ¡/conf-loc¿) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 935–947. https://doi.org/10.1145/3540250.3549098
  65. Blockchain Large Language Models. arXiv:2304.12749 [cs.CR]
  66. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997 [cs.CL]
  67. LLM Censorship: A Machine Learning Challenge or a Computer Security Problem? arXiv:2307.10719 [cs.AI]
  68. Google. 2023. Bard. https://Bard.google.com.
  69. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. 79–90.
  70. GraphCodeBERT: Pre-training Code Representations with Data Flow. arXiv:2009.08366 [cs.SE]
  71. LogGPT: Log Anomaly Detection via GPT. arXiv:2309.14482 [cs.LG]
  72. Hans W. A. Hanley and Zakir Durumeric. 2023. Twits, Toxic Tweets, and Tribal Tendencies: Trends in Politically Polarized Posts on Twitter. arXiv:2307.10349 [cs.SI]
  73. Evaluating LLMs for Privilege-Escalation Scenarios. arXiv:2310.11409 [cs.CR]
  74. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection. arXiv:2203.09509 [cs.CL]
  75. DeBERTa: Decoding-enhanced BERT with Disentangled Attention. arXiv:2006.03654 [cs.CL]
  76. Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models. arXiv:2308.12287 [cs.CR]
  77. hiyouga. 2023. LLaMA Efficient Tuning. https://github.com/hiyouga/LLaMA-Efficient-Tuning.
  78. Training Compute-Optimal Large Language Models. arXiv:2203.15556 [cs.CL]
  79. Large Language Models for Software Engineering: A Systematic Literature Review. arXiv:2308.10620 [cs.SE]
  80. Parameter-Efficient Transfer Learning for NLP. arXiv:1902.00751 [cs.LG]
  81. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. arXiv:2305.02301 [cs.CL]
  82. Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug Trafficking Detection on Social Media. arXiv:2307.03699 [cs.CL]
  83. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685 [cs.CL]
  84. Augmenting Greybox Fuzzing with Generative AI. arXiv:2306.06782 [cs.CR]
  85. DeGPT: Optimizing Decompiler Output with LLM. Proceedings 2024 Network and Distributed System Security Symposium (2024). https://api.semanticscholar.org/CorpusID:267622140
  86. Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. arXiv:2310.01152 [cs.CR]
  87. An empirical study on fine-tuning large language models of code for automated program repair. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1162–1174.
  88. Breier Jakub and Jana Branišová. 2017. A Dynamic Rule Creation Based Anomaly Detection Method for Identifying Security Breaches in Log Records. Wireless Personal Communications 94 (06 2017). https://doi.org/10.1007/s11277-015-3128-1
  89. Suhaima Jamal and Hayden Wimmer. 2023. An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach. arXiv:2311.04913 [cs.CL]
  90. How Can We Know What Language Models Know? arXiv:1911.12543 [cs.CL]
  91. InferFix: End-to-End Program Repair with LLMs. arXiv:2303.07263 [cs.SE]
  92. AVScan2Vec: Feature Learning on Antivirus Scan Data for Production-Scale Malware Corpora. arXiv:2306.06228 [cs.CR]
  93. Challenges and Applications of Large Language Models. arXiv:2307.10169 [cs.CL]
  94. Large Language Models Struggle to Learn Long-Tail Knowledge. arXiv:2211.08411 [cs.CL]
  95. Scaling Laws for Neural Language Models. arXiv:2001.08361 [cs.LG]
  96. Benchmarking Large Language Models for Log Analysis, Security, and Interpretation. arXiv:2311.14519 [cs.NI]
  97. Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities. arXiv:2311.16169 [cs.CR]
  98. Systematic literature reviews in software engineering–a systematic literature review. Information and software technology 51, 1 (2009), 7–15.
  99. Detecting Phishing Sites Using ChatGPT. arXiv:2306.05816 [cs.CR]
  100. Large Language Models are Zero-Shot Reasoners. arXiv:2205.11916 [cs.CL]
  101. Maxime Labonne and Sean Moran. 2023. Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection. arXiv:2304.01238 [cs.CL]
  102. A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets. arXiv:2305.18486 [cs.CL]
  103. Invalidator: Automated Patch Correctness Assessment Via Semantic and Syntactic Reasoning. IEEE Transactions on Software Engineering 49, 6 (2023), 3411–3429. https://doi.org/10.1109/TSE.2023.3255177
  104. A Light Bug Triage Framework for Applying Large Pre-trained Language Model. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (¡conf-loc¿, ¡city¿Rochester¡/city¿, ¡state¿MI¡/state¿, ¡country¿USA¡/country¿, ¡/conf-loc¿) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 3, 11 pages. https://doi.org/10.1145/3551349.3556898
  105. Deduplicating Training Data Makes Language Models Better. arXiv:2107.06499 [cs.CL]
  106. The Power of Scale for Parameter-Efficient Prompt Tuning. arXiv:2104.08691 [cs.CL]
  107. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv:1910.13461 [cs.CL]
  108. Frank Li and Vern Paxson. 2017. A Large-Scale Empirical Study of Security Patches. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS ’17). Association for Computing Machinery, New York, NY, USA, 2201–2215. https://doi.org/10.1145/3133956.3134072
  109. The Hitchhiker’s Guide to Program Analysis: A Journey with Large Language Models. arXiv:2308.00245 [cs.SE]
  110. Tool-Augmented Reward Modeling. arXiv:2310.01045 [cs.CL]
  111. StarCoder: may the source be with you! arXiv:2305.06161 [cs.CL]
  112. Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. arXiv:2304.11686 [cs.SE]
  113. Generative Pre-Trained Transformer-Based Reinforcement Learning for Testing Web Application Firewalls. IEEE Transactions on Dependable and Secure Computing 21, 1 (2024), 309–324. https://doi.org/10.1109/TDSC.2023.3252523
  114. HW-V2W-Map: Hardware Vulnerability to Weakness Mapping Framework for Root Cause Analysis with GPT-assisted Mitigation Suggestion. arXiv:2312.13530 [cs.CR]
  115. Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey. arXiv:2305.18703 [cs.CL]
  116. A large-scale empirical study on vulnerability distribution within projects and the lessons learned. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1547–1559. https://doi.org/10.1145/3377811.3380923
  117. Log-based Anomaly Detection based on EVT Theory with feedback. arXiv:2306.05032 [cs.SE]
  118. DeepSQLi: deep semantic learning for testing SQL injection. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual Event, USA) (ISSTA 2020). Association for Computing Machinery, New York, NY, USA, 286–297. https://doi.org/10.1145/3395363.3397375
  119. Harnessing the Power of LLM to Support Binary Taint Analysis. arXiv:2310.08275 [cs.CR]
  120. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv:2107.13586 [cs.CL]
  121. Malicious URL Detection via Pretrained Language Model Guided Multi-Level Feature Attention Network. arXiv:2311.12372 [cs.CR]
  122. Not The End of Story: An Evaluation of ChatGPT-Driven Vulnerability Description Mappings. In Findings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 3724–3731. https://doi.org/10.18653/v1/2023.findings-acl.229
  123. Datasets for Large Language Models: A Comprehensive Survey. arXiv:2402.18041 [cs.CL]
  124. Prompt Injection attack against LLM-integrated Applications. arXiv preprint arXiv:2306.05499 (2023).
  125. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs.CL]
  126. Full Parameter Fine-tuning for Large Language Models with Limited Resources. arXiv:2306.09782 [cs.CL]
  127. CharBERT: Character-aware Pre-trained Language Model. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.4
  128. Self-Refine: Iterative Refinement with Self-Feedback. arXiv:2303.17651 [cs.CL]
  129. FlowTransformer: A transformer framework for flow-based network intrusion detection systems. Expert Systems with Applications 241 (2024). https://doi.org/10.1016/j.eswa.2023.122564
  130. John Levi Martin. 2023. The Ethico-Political Universe of ChatGPT. Journal of Social Computing 4, 1 (2023), 1–11. https://doi.org/10.23919/JSC.2023.0003
  131. Large language model guided protocol fuzzing. In Proceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS).
  132. Unlocking Hardware Security Assurance: The Potential of LLMs. arXiv:2308.11042 [cs.CR]
  133. Automated stance detection in complex topics and small languages: the challenging case of immigration in polarizing news media. arXiv:2305.13047 [cs.CL]
  134. Large Language Models: A Survey. arXiv:2402.06196 [cs.CL]
  135. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. arXiv:1802.09089 [cs.CR]
  136. Web Application Attacks Detection Using Deep Learning. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, João Manuel R. S. Tavares, João Paulo Papa, and Manuel González Hidalgo (Eds.). Springer International Publishing, Cham, 227–236.
  137. Modelling direct messaging networks with multiple recipients for cyber deception. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). 1–19. https://doi.org/10.1109/EuroSP53844.2022.00009
  138. LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing. arXiv:2310.06936 [cs.CR]
  139. Large Language Models in Cybersecurity: State-of-the-Art. arXiv:2402.00891 [cs.CR]
  140. Nour Moustafa and Jill Slay. 2015. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In 2015 Military Communications and Information Systems Conference (MilCIS). 1–6. https://doi.org/10.1109/MilCIS.2015.7348942
  141. Generating Secure Hardware using ChatGPT Resistant to CWEs. Cryptology ePrint Archive, Paper 2023/212. https://eprint.iacr.org/2023/212 https://eprint.iacr.org/2023/212.
  142. How Hardened is Your Hardware? Guiding ChatGPT to Generate Secure Hardware Resistant to CWEs. In Cyber Security, Cryptology, and Machine Learning, Shlomi Dolev, Ehud Gudes, and Pascal Paillier (Eds.). Springer Nature Switzerland, Cham, 320–336.
  143. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. arXiv:2203.13474 [cs.LG]
  144. Generative AI in EU Law: Liability, Privacy, Intellectual Property, and Cybersecurity. arXiv:2401.07348 [cs.CY]
  145. Is Self-Repair a Silver Bullet for Code Generation? arXiv:2306.09896 [cs.CL]
  146. OpenAI. 2022. GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5.
  147. OpenAI. 2023a. Fine-tuning. https://platform.openai.com/docs/guides/fine-tuning.
  148. OpenAI. 2023b. GPT-4. https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo.
  149. OpenAI. 2024. Technical report of Sora. https://openai.com/research/video-generation-models-as-world-simulators.
  150. DialogBench: Evaluating LLMs as Human-like Dialogue Systems. arXiv:2311.01677 [cs.CL]
  151. Training language models to follow instructions with human feedback. arXiv:2203.02155 [cs.CL]
  152. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Transactions on Knowledge and Data Engineering (2024), 1–20. https://doi.org/10.1109/tkde.2024.3352100
  153. DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection. arXiv:2308.06932 [cs.CR]
  154. Kenneth G. Paterson and Douglas Stebila. 2010. One-Time-Password-Authenticated Key Exchange. In Information Security and Privacy, Ron Steinfeld and Philip Hawkes (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 264–281.
  155. Enhancing Automated Program Repair through Fine-tuning and Prompt Engineering. arXiv:2304.07840 [cs.LG]
  156. Examining Zero-Shot Vulnerability Repair with Large Language Models. In 2023 IEEE Symposium on Security and Privacy (SP). 2339–2356. https://doi.org/10.1109/SP46215.2023.10179324
  157. Pop Quiz! Can a Large Language Model Help With Reverse Engineering? arXiv:2202.01142 [cs.SE]
  158. Exploiting Code Symmetries for Learning Program Semantics. arXiv:2308.03312 [cs.LG]
  159. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. arXiv:2306.01116 [cs.CL]
  160. AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation. arXiv:2310.02655 [cs.CR]
  161. Do Users Write More Insecure Code with AI Assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). ACM. https://doi.org/10.1145/3576915.3623157
  162. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and software technology 64 (2015), 1–18.
  163. Chatbots to ChatGPT in a Cybersecurity Space: Evolution, Vulnerabilities, Attacks, Challenges, and Future Recommendations. arXiv:2306.09255 [cs.CR]
  164. LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection. arXiv:2309.01189 [cs.LG]
  165. Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! arXiv:2310.03693 [cs.CL]
  166. XGV-BERT: Leveraging Contextualized Language Model and Graph Neural Network for Efficient Software Vulnerability Detection. arXiv:2309.14677 [cs.CR]
  167. Improving language understanding by generative pre-training. (2018).
  168. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  169. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv:1910.10683 [cs.LG]
  170. Generating Fake Cyber Threat Intelligence Using Transformer-Based Models. In 2021 International Joint Conference on Neural Networks (IJCNN). 1–9. https://doi.org/10.1109/IJCNN52387.2021.9534192
  171. PassGPT: Password Modeling and (Guided) Generation with Large Language Models. arXiv:2306.01545 [cs.CL]
  172. Risks and benefits of large language models for the environment. Environmental Science & Technology 57, 9 (2023), 3464–3466.
  173. Sereum: Protecting Existing Smart Contracts Against Re-Entrancy Attacks. arXiv:1812.05934 [cs.CR]
  174. From Chatbots to PhishBots? – Preventing Phishing scams created using ChatGPT, Google Bard and Claude. arXiv:2310.19181 [cs.CR]
  175. Code Llama: Open Foundation Models for Code. arXiv:2308.12950 [cs.CL]
  176. Analysis of ChatGPT on Source Code. arXiv:2306.00597 [cs.SE]
  177. LLM for SoC Security: A Paradigm Shift. arXiv:2310.06046 [cs.CR]
  178. R.S. Sandhu and P. Samarati. 1994. Access control: principle and practice. IEEE Communications Magazine 32, 9 (1994), 40–48. https://doi.org/10.1109/35.312842
  179. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 [cs.CL]
  180. ChatGPT for Digital Forensic Investigation: The Good, The Bad, and The Unknown. arXiv:2307.10195 [cs.CR]
  181. Jonathon Schwartz and Hanna Kurniawati. 2019. Autonomous Penetration Testing using Reinforcement Learning. arXiv:1905.05965 [cs.CR]
  182. Mapping process of digital forensic investigation framework. International Journal of Computer Science and Network Security 8, 10 (2008), 163–169.
  183. Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs. arXiv preprint arXiv:2404.00640 (2024).
  184. Murray Shanahan. 2023. Talking About Large Language Models. arXiv:2212.03551 [cs.CL]
  185. Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot Learning. arXiv:2102.03983 [cs.CV]
  186. RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair. arXiv:2312.15698 [cs.SE]
  187. Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild. arXiv:2307.10214 [cs.CR]
  188. An Analysis of the Automatic Bug Fixing Performance of ChatGPT. arXiv:2301.08653 [cs.SE]
  189. BinMLM: Binary Authorship Verification with Flow-aware Mixture-of-Shared Language Model. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 1023–1033. https://doi.org/10.1109/SANER53432.2022.00120
  190. How to Fine-Tune BERT for Text Classification? arXiv:1905.05583 [cs.CL]
  191. DexBERT: Effective, Task-Agnostic and Fine-Grained Representation Learning of Android Bytecode. IEEE Transactions on Software Engineering 49, 10 (2023), 4691–4706. https://doi.org/10.1109/TSE.2023.3310874
  192. GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. arXiv:2308.03314 [cs.CR]
  193. On the Importance of Building High-quality Training Datasets for Neural Code Search. arXiv:2202.06649 [cs.SE]
  194. Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam. arXiv:2204.01233 [cs.CR]
  195. Just-in-Time Security Patch Detection – LLM At the Rescue for Data Augmentation. arXiv:2312.01241 [cs.CR]
  196. Sheetal Temara. 2023. Maximizing Penetration Testing Success with Effective Reconnaissance Techniques using ChatGPT. arXiv:2307.06391 [cs.CR]
  197. Transformer-Based Language Models for Software Vulnerability Detection. In Proceedings of the 38th Annual Computer Security Applications Conference (¡conf-loc¿, ¡city¿Austin¡/city¿, ¡state¿TX¡/state¿, ¡country¿USA¡/country¿, ¡/conf-loc¿) (ACSAC ’22). Association for Computing Machinery, New York, NY, USA, 481–496. https://doi.org/10.1145/3564625.3567985
  198. The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification. arXiv:2307.02192 [cs.DB]
  199. M. Caner Tol and Berk Sunar. 2023. ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching. arXiv:2308.13062 [cs.CR]
  200. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]
  201. Can Large Language Models Identify And Reason About Security Vulnerabilities? Not Yet. arXiv:2312.12575 [cs.CR]
  202. LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. arXiv:2312.12575 [cs.CR]
  203. webFuzz: Grey-Box Fuzzing for Web Applications. In Computer Security – ESORICS 2021, Elisa Bertino, Haya Shulman, and Michael Waidner (Eds.). Springer International Publishing, Cham, 152–172.
  204. Attention Is All You Need. arXiv:1706.03762 [cs.CL]
  205. Linguistic Steganalysis in Few-Shot Scenario. IEEE Transactions on Information Forensics and Security 18 (2023), 4870–4882. https://doi.org/10.1109/TIFS.2023.3298210
  206. Security and Privacy on Generative Data in AIGC: A Survey. arXiv:2309.09435 [cs.CR]
  207. RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair. arXiv:2309.06057 [cs.SE]
  208. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. arXiv:2109.00859 [cs.CL]
  209. The Effectiveness of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis. arXiv:2307.12488 [cs.CR]
  210. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL]
  211. Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’23). ACM. https://doi.org/10.1145/3611643.3616271
  212. Most users do not follow political elites on Twitter; those who do show overwhelming preferences for ideological congruity. Science Advances 8, 39 (2022), eabn9418. https://doi.org/10.1126/sciadv.abn9418 arXiv:https://www.science.org/doi/pdf/10.1126/sciadv.abn9418
  213. Exploring the Limits of ChatGPT in Software Security Applications. arXiv:2312.05275 [cs.CR]
  214. DeFiRanger: Detecting Price Manipulation Attacks on DeFi Applications. arXiv:2104.15068 [cs.CR]
  215. How Effective Are Neural Networks for Fixing Security Vulnerabilities. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’23). ACM. https://doi.org/10.1145/3597926.3598135
  216. The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv:2309.07864 [cs.AI]
  217. Fuzz4All: Universal Fuzzing with Large Language Models. arXiv:2308.04748 [cs.SE]
  218. Automated program repair in the era of large pre-trained language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1482–1494.
  219. Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (¡conf-loc¿, ¡city¿Singapore¡/city¿, ¡country¿Singapore¡/country¿, ¡/conf-loc¿) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 959–971. https://doi.org/10.1145/3540250.3549101
  220. Chunqiu Steven Xia and Lingming Zhang. 2023a. Conversational Automated Program Repair. arXiv:2301.13246 [cs.SE]
  221. Chunqiu Steven Xia and Lingming Zhang. 2023b. Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arXiv:2304.00385 [cs.SE]
  222. Zero-Shot Learning – A Comprehensive Evaluation of the Good, the Bad and the Ugly. arXiv:1707.00600 [cs.CV]
  223. LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis. arXiv:2306.02546 [cs.SE]
  224. Large Language Models for Test-Free Fault Localization. arXiv:2310.01726 [cs.SE]
  225. White-box Compiler Fuzzing Empowered by Large Language Models. arXiv:2310.15991 [cs.SE]
  226. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. arXiv:2304.13712 [cs.CL]
  227. A Survey on Large Language Model (LLM) Security and Privacy: The Good, The Bad, and The Ugly. High-Confidence Computing 4, 2 (June 2024), 100211. https://doi.org/10.1016/j.hcc.2024.100211
  228. A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems. arXiv:2402.18013 [cs.CL]
  229. Review of Generative AI Methods in Cybersecurity. arXiv:2403.08701 [cs.CR]
  230. A Survey on Multimodal Large Language Models. arXiv:2306.13549 [cs.CV]
  231. CIRCLE: continual repair across programming languages. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (¡conf-loc¿, ¡city¿Virtual¡/city¿, ¡country¿South Korea¡/country¿, ¡/conf-loc¿) (ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 678–690. https://doi.org/10.1145/3533767.3534219
  232. Data-centric Artificial Intelligence: A Survey. arXiv:2303.10158 [cs.LG]
  233. Research on third-party libraries in android apps: A taxonomy and systematic literature review. IEEE Transactions on Software Engineering 48, 10 (2021), 4181–4213.
  234. Understanding Large Language Model Based Fuzz Driver Generation. arXiv:2307.12469 [cs.CR]
  235. Prompt-Enhanced Software Vulnerability Detection Using ChatGPT. arXiv:2308.12697 [cs.SE]
  236. Identifying relevant studies in software engineering. Information and Software Technology 53, 6 (2011), 625–637.
  237. Pre-Trained Model-Based Automated Software Vulnerability Repair: How Far are We? IEEE Transactions on Dependable and Secure Computing (2023), 1–18. https://doi.org/10.1109/TDSC.2023.3308897
  238. GAMMA: Revisiting Template-based Automated Program Repair via Mask Prediction. arXiv:2309.09308 [cs.SE]
  239. A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair. arXiv:2310.08879 [cs.SE]
  240. Program vulnerability repair via inductive inference. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 691–702.
  241. STEAM: Simulating the InTeractive BEhavior of ProgrAMmers for Automatic Bug Fixing. arXiv:2308.14460 [cs.SE]
  242. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. arXiv:2309.01219 [cs.CL]
  243. Automated Static Warning Identification via Path-based Semantic Representation. arXiv preprint arXiv:2306.15568 (2023).
  244. Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code. arXiv:2311.07989 [cs.CL]
  245. A Survey of Large Language Models. arXiv:2303.18223 [cs.CL]
  246. LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. arXiv:2403.13372 [cs.CL]
  247. A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends. arXiv:2311.10372 [cs.SE]
  248. An overview on smart contracts: Challenges, advances and platforms. Future Generation Computer Systems 105 (April 2020), 475–491. https://doi.org/10.1016/j.future.2019.12.019
  249. Large Language Model for Vulnerability Detection and Repair: Literature Review and Roadmap. arXiv preprint arXiv:2404.02525 (2024).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. HanXiang Xu (4 papers)
  2. Kai Chen (512 papers)
  3. Yang Liu (2253 papers)
  4. Ting Yu (126 papers)
  5. Haoyu Wang (309 papers)
  6. Kailong Wang (41 papers)
  7. Ningke Li (4 papers)
  8. Shenao Wang (15 papers)
  9. Yanjie Zhao (39 papers)
Citations (16)