Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools (2508.02110v1)

Published 4 Aug 2025 in cs.AI

Abstract: LLM agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools. However, this tool-centric paradigm introduces a previously underexplored attack surface: adversaries can manipulate tool metadata -- such as names, descriptions, and parameter schemas -- to influence agent behavior. We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents, without requiring prompt injection or access to model internals. To demonstrate and exploit this vulnerability, we propose the Attractive Metadata Attack (AMA), a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata through iterative optimization. Our attack integrates seamlessly into standard tool ecosystems and requires no modification to the agent's execution framework. Extensive experiments across ten realistic, simulated tool-use scenarios and a range of popular LLM agents demonstrate consistently high attack success rates (81\%-95\%) and significant privacy leakage, with negligible impact on primary task execution. Moreover, the attack remains effective even under prompt-level defenses and structured tool-selection protocols such as the Model Context Protocol, revealing systemic vulnerabilities in current agent architectures. These findings reveal that metadata manipulation constitutes a potent and stealthy attack surface, highlighting the need for execution-level security mechanisms that go beyond prompt-level defenses.

Summary

The paper introduces the Attractive Metadata Attack (AMA) that manipulates LLM tool selection through deceptive metadata.
It employs a simulation-guided iterative optimization process to craft metadata that misleads LLM agents into invoking malicious tools.
Experimental results show an 81% to 95% attack success rate across various LLM platforms, highlighting serious privacy risks.

Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools

Introduction

The paper "Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools" (2508.02110) presents a novel attack vector exploiting the metadata of tools used by LLM agents. This attack paradigm leverages the tool-centric operational framework of LLMs to manipulate their behavior without traditional model tampering. This essay outlines the technical mechanisms of the Attractive Metadata Attack (AMA) and evaluates the implications of this vulnerability.

Attack Mechanism

The Attractive Metadata Attack (AMA) represents a sophisticated method of persuasion over coercion. By crafting alluring metadata that embellishes malicious tools, adversaries can mislead LLM agents into prioritizing and invoking such tools over benign ones. This approach effectively subverts tool selection processes primarily influenced by superficially benign attributes rather than internal tool functionality or intent.

Figure 1: A motivating example of the Attractive Metadata Attack (AMA). Left: standard tool invocation, where the ``unknown'' (UK) tool is typically ignored. Right: under AMA, the UK tool is wrapped with attractive metadata (as UK tool

), inducing the agent to prioritize it and enabling covert malicious actions such as privacy theft.*

AMA is executed through a simulation-guided iterative optimization process. This process generates metadata designed to increase the attractiveness of a given tool from both breadth and depth perspectives, promoting convergence towards a configuration likely to mislead tool selection.

Figure 2: Optimization Pipeline for AMA. The attacker constructs malicious tools with increasingly attractive metadata through a simulation-guided iterative optimization process.

Security Implications

Experimentation across multiple LLM platforms, such as Gemma3-27B, LLaMA3.3-70B, and GPT-4o-mini, demonstrates attack success rates of 81% to 95% under AMA. Furthermore, the attack remains effective under model context protocols and prompt-level defenses, indicating systemic vulnerabilities within existing LLM architectures.

Figure 3: ASR across task scenario. Solid bars: targeted attacks; hatched bars: untargeted attacks.

Privacy Risks

The paper also underscores the privacy implications of AMA, with significant personal identifiable information (PII) leakage observed across diverse scenarios.

Figure 4: Field-level PII leakage under targeted and untargeted AMA attacks.

Future Directions

The findings propel urgent discussions concerning the development of execution-level security mechanisms that extend beyond conventional prompt-level defenses. Robust verification methodologies and enhanced tool selection frameworks are necessary to mitigate the effects of metadata-driven manipulation.

Conclusion

The Attractive Metadata Attack (AMA) illuminates an unexamined yet formidable attack surface within LLM ecosystems. By manipulating benign-looking metadata, adversaries can orchestrate covert operations that severely compromise agent integrity, highlighting the critical need for strengthened security measures in LLM tool ecosystems. Future work should focus on advancing detection systems capable of discerning genuine intent from obfuscated tool metadata.