A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment (2504.15585v4)

Published 22 Apr 2025 in cs.CR, cs.AI, cs.CL, and cs.LG

Abstract: The remarkable success of LLMs has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concern, not only for researchers and corporations but also for every nation. Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e.g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire "lifechain" of LLMs. To address this gap, this paper introduces, for the first time, the concept of "full-stack" safety to systematically consider safety issues throughout the entire process of LLM training, deployment, and eventual commercialization. Compared to the off-the-shelf LLM safety surveys, our work demonstrates several distinctive advantages: (I) Comprehensive Perspective. We define the complete LLM lifecycle as encompassing data preparation, pre-training, post-training, deployment and final commercialization. To our knowledge, this represents the first safety survey to encompass the entire lifecycle of LLMs. (II) Extensive Literature Support. Our research is grounded in an exhaustive review of over 800+ papers, ensuring comprehensive coverage and systematic organization of security issues within a more holistic understanding. (III) Unique Insights. Through systematic literature analysis, we have developed reliable roadmaps and perspectives for each chapter. Our work identifies promising research directions, including safety in data generation, alignment techniques, model editing, and LLM-based agent systems. These insights provide valuable guidance for researchers pursuing future work in this field.

PDF Abstract

Full-Stack Safety for LLMs: A Comprehensive Survey

The essay summarizes a comprehensive survey on LLM safety, which is crucial given the increasing integration of LLMs in various applications. The paper provides an exhaustive review, proposing a holistic approach to understanding and mitigating safety risks throughout LLM lifecycles, from data preparation to commercialization. Here's a structured summary emphasizing the core concepts and insights presented in the paper.

Introduction to LLM Full-Stack Safety

The authors propose "full-stack" safety as a concept encompassing security concerns across the entire lifecycle of LLMs. This approach addresses the gaps in existing literature that often focus on discrete stages of LLM lifecycles—such as the pre-training or deployment phase—and offers a more integrated perspective on the broader safety landscape critical to both academic research and practical implementations.

Comprehensive Perspective: Lifecycle Analysis

A structured framework is laid out to assess the complete lifecycle of LLMs, categorized into distinct phases: data preparation, model pre-training, post-training, deployment, and commercialization. Each phase is scrutinized for potential safety risks and mitigation strategies:

Data Preparation: A focus on data integrity and security, considering issues like poisoning and privacy breaches, establishes foundational safeguards against toxicity and privacy infringements.
Pre-training: It emphasizes the need for effective filtering and augmentation techniques to prevent the introduction of unsafe data during this foundational model training stage.
Post-training Safety: The paper explores recent advances in fine-tuning safety and alignment mechanisms, categorizing defense strategies and pinpointing promising directions.
Model Editing and Unlearning: The authors underscore the significance of dynamic model updates and unlearning processes to enhance operational safety after training.
Deployment: A detailed examination of deployment-specific safety challenges underscores robust defense mechanisms against attacks like jailbreak and prompt injection in pure LLMs and agent-enhanced LLM systems.

Unique Insights and Perspectives

Through synthesized literature analysis, the paper outlines reliable roadmaps for future research. Key insights and areas of focus include:

Safety in Data Generation: Recommendations are made for robust distillation methods to enhance data security, integrity, and privacy during generation processes.
Alignment Techniques: Research directions on advancing value-aligned optimization and scalable safety alignment frameworks are highlighted as promising.
Agent Systems: The integration of tools, memory, and environment constructs within LLM-based agents is reviewed, offering insights into security measures tailored for these complex systems.

Implications and Future Directions

The implications of the reviewed research are far-reaching, affecting theoretical and practical developments in AI safety. Key areas for future exploration include:

Advancements in reliable data distillation techniques and novel paradigms for secure data generation.
Strengthening post-training safety through optimized fine-tuning and model alignment workflows.
More stable and efficient methods of model editing and unlearning, ensuring ongoing updates without sacrificing model integrity.
Ensuring secure deployment and robust defense mechanisms for LLM-based agent systems, particularly in tool and memory management.

Conclusions and Broader Impact

Overall, the comprehensive full-stack approach presented in this paper contributes significantly to the ongoing conversation about LLM safety. It offers a structured taxonomy and an in-depth analysis extending beyond prior surveys, guiding future research and industry practices. The insights gained from this survey have the potential to enhance the safe deployment of LLMs and agent systems across various applications, impacting fields from content generation and programming to healthcare and finance. Addressing the detailed action points underscored in this comprehensive survey would transform LLM adaptability, integrity, and security, ensuring their responsible and effective use in modern AI ecosystems.

PDF Markdown Bookmark Chat (Pro)

Authors (103)

Kun Wang (355 papers)
Guibin Zhang (29 papers)
Zhenhong Zhou (15 papers)
Jiahao Wu (45 papers)
Miao Yu (75 papers)
Shiqian Zhao (8 papers)
Chenlong Yin (3 papers)
Jinhu Fu (2 papers)
Yibo Yan (39 papers)
Hanjun Luo (8 papers)
Liang Lin (318 papers)
Zhihao Xu (53 papers)
Haolang Lu (5 papers)
Xinye Cao (4 papers)
Xinyun Zhou (2 papers)
Weifei Jin (5 papers)
Fanci Meng (4 papers)
Junyuan Mao (7 papers)
Hao Wu (623 papers)
Minghe Wang (5 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/zxlzr/status/1920886924912578830

https://twitter.com/fly51fly/status/1915165366822834482

https://twitter.com/AISecHub/status/1921174078687240297

https://twitter.com/TheTuringPost/status/1917727072623157675

https://twitter.com/rakeshgohel01/status/1922639779784495406

https://twitter.com/AISecHub/status/1921173517069893775

YouTube

Show All Videos