RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents (2510.02609v1)

Published 2 Oct 2025 in cs.SE

Abstract: Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic execution, debugging, and interactive programming capabilities. While these advancements have streamlined complex workflows, they have also introduced critical safety and security risks. Current static safety benchmarks and red-teaming tools are inadequate for identifying emerging real-world risky scenarios, as they fail to cover certain boundary conditions, such as the combined effects of different jailbreak tools. In this work, we propose RedCodeAgent, the first automated red-teaming agent designed to systematically uncover vulnerabilities in diverse code agents. With an adaptive memory module, RedCodeAgent can leverage existing jailbreak knowledge, dynamically select the most effective red-teaming tools and tool combinations in a tailored toolbox for a given input query, thus identifying vulnerabilities that might otherwise be overlooked. For reliable evaluation, we develop simulated sandbox environments to additionally evaluate the execution results of code agents, mitigating potential biases of LLM-based judges that only rely on static code. Through extensive evaluations across multiple state-of-the-art code agents, diverse risky scenarios, and various programming languages, RedCodeAgent consistently outperforms existing red-teaming methods, achieving higher attack success rates and lower rejection rates with high efficiency. We further validate RedCodeAgent on real-world code assistants, e.g., Cursor and Codeium, exposing previously unidentified security risks. By automating and optimizing red-teaming processes, RedCodeAgent enables scalable, adaptive, and effective safety assessments of code agents.

Summary

The paper introduces RedCodeAgent, which leverages a memory module and diversified toolbox to optimize red-teaming prompt generation against code agents.
The system achieves higher attack success rates and lower rejection rates compared to static, traditional red-teaming methods.
The study highlights the efficiency and adaptability of dynamic, memory-driven strategies in uncovering previously undetected vulnerabilities.

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents

Introduction

"RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents" presents a system developed to address the vulnerabilities in code agents powered by LLMs. These code agents, integral to automating complex workflows, pose significant security risks due to their capability of executing potentially harmful code. Traditional red-teaming methods fall short in uncovering vulnerabilities due to their static nature. RedCodeAgent aims to fill this gap by providing an automated, adaptive red-teaming approach.

System Architecture and Methodology

RedCodeAgent integrates three core components: a memory module, a diversified toolbox, and a simulated evaluation environment. The memory module serves as a knowledge base for storing and retrieving effective red-teaming experiences, enabling efficient prompt optimization for similar future tasks. The toolbox, comprising various jailbreak tools and a code substitution tool, assists in generating optimized prompts to exploit vulnerabilities in target code agents.

Figure 1: Illustration of RedCodeAgent on automatic red-teaming against a target code agent.

RedCodeAgent operates by evaluating the safety of code agents in a sandbox environment, ensuring that execution results align with security standards without being influenced by LLM biases.

Experimental Evaluation

The experimental results demonstrate RedCodeAgent's superior performance across diverse benchmarks, showing higher attack success rates (ASR) and lower rejection rates (RR) compared to traditional jailbreak methods.

Attack Success Rate (ASR): RedCodeAgent consistently achieves higher ASR across multiple state-of-the-art code agents and scenarios.
Rejection Rate (RR): The system also maintains a lower RR, indicating its ability to generate stealthy prompts that code agents do not outrightly reject.

Figure 2: Attack success rate (ASR) against the OCI code agent across various risk scenarios. The experimental results show that RedCodeAgent achieves higher success rates compared to other jailbreak methods.

Efficiency and Adaptability

RedCodeAgent displays remarkable efficiency, achieving effective red-teaming execution with fewer resources. The dynamic nature of its memory retrieval and toolbox selection allows it to adapt to various risk scenarios, optimizing its approach based on prior successful experiences.

Tool Utilization: The paper highlighted how RedCodeAgent judiciously employs tools to enhance ASR, demonstrating scalability by benefiting from extra tools without compromising efficiency.
Figure 3: Attack success rate (ASR) across various risk scenarios under different execution modes. The results highlight the impact of the memory module in improving RedCodeAgent's performance across different tasks.

Discovering New Vulnerabilities

RedCodeAgent showcases its prowess in exploring and identifying new vulnerabilities that remain undetected by traditional methods. The strategic use of its components enables it to adaptively refine and optimize its red-teaming strategies, discovering new attack vectors.

Conclusion

RedCodeAgent represents a significant advancement in the automatic red-teaming domain, especially for assessing code agent vulnerabilities. By integrating memory-driven adaptability with a comprehensive toolbox, it provides a scalable and effective solution for identifying code agent vulnerabilities. Future developments could focus on further enhancing its adaptability and extending its applicability to other agent types. The results from this paper reinforce the efficacy of adaptive, memory-driven strategies in enhancing the security assessment of complex LLM-powered systems.

In conclusion, RedCodeAgent not only sets a precedent for future red-teaming frameworks but also highlights the need for dynamic, adaptable approaches in security assessments, addressing the evolving landscape of automated code generation and interpretation.