Papers
Topics
Authors
Recent
Search
2000 character limit reached

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Published 9 Jan 2026 in cs.CR and cs.AI | (2601.07853v1)

Abstract: Financial agents powered by LLMs are increasingly deployed for investment analysis, risk assessment, and automated decision-making, where their abilities to plan, invoke tools, and manipulate mutable state introduce new security risks in high-stakes and highly regulated financial environments. However, existing safety evaluations largely focus on language-model-level content compliance or abstract agent settings, failing to capture execution-grounded risks arising from real operational workflows and state-changing actions. To bridge this gap, we propose FinVault, the first execution-grounded security benchmark for financial agents, comprising 31 regulatory case-driven sandbox scenarios with state-writable databases and explicit compliance constraints, together with 107 real-world vulnerabilities and 963 test cases that systematically cover prompt injection, jailbreaking, financially adapted attacks, as well as benign inputs for false-positive evaluation. Experimental results reveal that existing defense mechanisms remain ineffective in realistic financial agent settings, with average attack success rates (ASR) still reaching up to 50.0\% on state-of-the-art models and remaining non-negligible even for the most robust systems (ASR 6.7\%), highlighting the limited transferability of current safety designs and the need for stronger financial-specific defenses. Our code can be found at https://github.com/aifinlab/FinVault.

Summary

  • The paper presents the first execution-grounded benchmark for financial agent safety by integrating state-writable environments and regulatory case scenarios.
  • It evaluates 10 LLM-based financial agents against 107 vulnerabilities using metrics like attack success rate and vulnerability compromise rate in multi-step workflows.
  • Results reveal critical security gaps in current defenses, highlighting the need for domain-adaptive, context-sensitive safeguards to ensure compliance.

FinVault: Execution-Grounded Benchmarking for Financial Agent Safety

Motivation and Problem Setting

The evolution of financial agents powered by LLMs has introduced unprecedented risk profiles due to their abilities in planning, tool invocation, and persistent state manipulation within complex, regulated environments. Existing security evaluation approaches predominantly target static, language-level compliance or operate in abstracted simulation interfaces devoid of real consequence validation. These prior paradigms neglect systemic execution risks introduced by agentic behaviors that can actively modify financial workflows and database states, resulting in unverified downstream compliance failures. Figure 1

Figure 1: Comparison between FinVault and existing paradigms, highlighting FinVault’s executable, state-writable testbed and focus on verifiable operational consequences.

FinVault is formulated to bridge this gap by delivering the first execution-grounded security benchmark tailored for financial agents. In contrast to previous model and agent benchmarks, FinVault integrates physically executable environments, state-writable databases, and formal compliance boundaries, enabling precise measurement of agent-induced real-world vulnerabilities and enforcement failures.

Benchmark Construction and Scenario Design

FinVault comprises 31 regulatory case-driven sandbox scenarios covering core domains: credit, insurance, securities, payments, compliance/AML, and risk management. Each scenario is instantiated with multi-step workflows, tool invocation capabilities, and permission/audit mechanisms tightly mapped to genuine financial processes. Vulnerabilities are derived from documented regulatory violation patterns and categorized into privilege bypass, compliance violation, information leakage, fraudulent approval, and audit evasion.

Attack coverage spans 107 vulnerabilities and 963 test cases, incorporating eight attack techniques: direct JSON injection, instruction overriding, role playing, progressive prompting, encoding obfuscation, hypothetical scenarios, authority impersonation, and emotional manipulation. Adversarial samples are augmented via LLM-in-the-loop paraphrasing and validated by financial compliance experts, resulting in a high-fidelity adversarial set. Figure 2

Figure 2: Overview of FinVault, illustrating benchmark data construction and agent attack/defense interactions within sandbox environments.

Experimental Evaluation

Ten leading LLMs—including Qwen3-Max, GPT-4o, Claude-Sonnet/Haiku, Gemini-Flash, DeepSeek—were instantiated as financial agents in FinVault's testbed. Representative alignment-based defense models (GPT-OSS-Safeguard, LLaMA Guard v3/v4) were evaluated for responsiveness to both attack and benign traffic.

Robust, quantitative metrics were employed:

  • Attack Success Rate (ASR): Proportion of attacks effectuating business-level breaches.
  • Vulnerability Compromise Rate: Fraction of vulnerabilities exploitable via any attack technique.
  • Defense TPR/FPR: True/false positive detection rates on adversarial and benign queries, respectively.

Notable findings:

  • Qwen3-Max exhibited the highest ASR at 50.00% and vulnerability compromise rate of 85.98%, indicating critical exposure.
  • Claude-Haiku-4.5 demonstrated resilience, yet still allowed 6.70% ASR and 26.17% vulnerability exploitation.
  • Role-playing and hypothetical scenario attacks consistently breached semantic boundaries across models, outperforming technical attacks.
  • Insurance workflows were exceptionally vulnerable (ASR 65.20% on Qwen3-Max) due to high discretion and complex policy logic.

Defense assessment revealed LLaMA Guard 4 achieved the highest detection rate (TPR 61.10%) but induced a substantial FPR (29.91%), causing spurious disruption of legitimate workflows. GPT-OSS-Safeguard excelled in minimizing false alarms yet lacked requisite detection coverage, limiting utility for operational deployment.

Threat Model and Failure Analysis

FinVault's adversarial taxonomy and empirical evaluation reveal three critical security limitations in agentic financial systems:

  • Semantic vulnerability dominance: Financially adapted, context-manipulating attacks (e.g., role impersonation, academic framing) exploit agent reasoning and persistent context, bypassing pattern-based guardrails.
  • Instruction boundary ambiguity: In models lacking rigid system/user prompt separation (Qwen3), instruction-override attacks induce high incidence of privilege/control escalation.
  • Transfer limitations in safety alignment: General LLM alignment methods do not successfully transfer to nuanced financial workflows; semantic complexity in compliance logic undermines static guardrails, necessitating domain-adaptive and context-sensitive defense designs.

Case analyses further highlight failure modes including context trust accumulation in multi-turn interactions, implicit privilege escalation, and softening of compliance boundaries under emotional or urgent framing.

Practical and Theoretical Implications

Practically, FinVault exposes the unsuitability of generic alignment and defense frameworks for financial agents, reinforcing the necessity for scenario-specific, execution-aware approaches. High ASR rates and persistent vulnerability compromise demonstrate the infeasibility of current agent deployment in regulated environments absent significant advancements.

Theoretically, FinVault’s construction establishes evaluation principles for any domain where agentic LLMs interact with mutable state and compliance logic. It motivates research into semantic reasoning defense, multi-turn adaptive safeguarding, and contextual privilege isolation. Future work may focus on developing lifelong agentic guardrails, adversarially trained risk detectors, and hierarchical reasoning defense architectures for high-risk domains.

Conclusion

FinVault represents a rigorous, execution-grounded benchmark for evaluating financial agent security. Empirical evidence reveals that contemporary agents and defenses exhibit severe limitations in resisting adaptive and financial-specific attacks, with ASRs frequently exceeding operationally acceptable thresholds. The benchmark’s dataset and framework provide a foundation for advancing secure, compliant AI agent deployment in finance and other mission-critical domains (2601.07853).

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 17 likes about this paper.