Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms

Published 8 Jul 2025 in cs.CR and cs.AI | (2507.06323v1)

Abstract: LLM agents face security vulnerabilities spanning AI-specific and traditional software domains, yet current research addresses these separately. This study bridges this gap through comparative evaluation of Function Calling architecture and Model Context Protocol (MCP) deployment paradigms using a unified threat classification framework. We tested 3,250 attack scenarios across seven LLMs, evaluating simple, composed, and chained attacks targeting both AI-specific threats (prompt injection) and software vulnerabilities (JSON injection, denial-of-service). Function Calling showed higher overall attack success rates (73.5% vs 62.59% for MCP), with greater system-centric vulnerability while MCP exhibited increased LLM-centric exposure. Attack complexity dramatically amplified effectiveness, with chained attacks achieving 91-96% success rates. Counterintuitively, advanced reasoning models demonstrated higher exploitability despite better threat detection. Results demonstrate that architectural choices fundamentally reshape threat landscapes. This work establishes methodological foundations for cross-domain LLM agent security assessment and provides evidence-based guidance for secure deployment. Code and experimental materials are available at https: // github. com/ theconsciouslab-ai/LLM-agent-security.

Summary

  • The paper demonstrates that Function Calling architectures exhibit higher system-centric ASR (73.5%), while MCP shows increased LLM-centric vulnerabilities.
  • The paper employs a unified threat classification framework to evaluate 3,250 attack scenarios using metrics like Attack Success Rate and Refusal Rate.
  • The paper recommends enhancing isolation and context validation to mitigate complex chained attacks and improve overall LLM agent security.

Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms

Introduction

The paper "Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms" introduces a critical examination of security vulnerabilities in LLM agents. By focusing on two contrasting deployment architectures: Function Calling and Model Context Protocol (MCP), the study evaluates security risks across AI-specific threats and traditional software vulnerabilities. The research highlights how architectural design choices not only influence but also potentially exacerbate exposure to diverse attack vectors.

Methodology

The study employs a unified threat classification framework that integrates both AI-specific and traditional software security taxonomies. This involves testing 3,250 attack scenarios across seven LLMs, examining simple, composed, and chained attacks. The evaluation metrics, Attack Success Rate (ASR) and Refusal Rate (RR), serve as the primary indicators of vulnerability exposure and defense effectiveness.

Function Calling Architecture:

  • Implements centralized orchestration, utilizing unified API endpoints.
  • Exhibits concentrated attack surfaces due to the tight coupling of tool definitions within API calls.

Model Context Protocol:

  • Follows a distributed client-server model, establishing explicit boundaries between agent and tool execution.
  • Provides enhanced attack containment but reveals increased susceptibility to LLM-centric exploits due to complex contextual interactions.

Experimental Findings

The comparative assessment revealed distinct vulnerability profiles across deployment paradigms:

Function Calling

  • Higher system-centric ASR (73.5%) due to centralized tool orchestration.
  • Increased vulnerability to tool manipulation and API parameter interception.

Model Context Protocol

  • Lower system-centric but higher LLM-centric ASR (62.59% overall; 68.28% LLM-centric).
  • Enhanced containment properties but challenged by cross-boundary attacks.

Chained attacks demonstrated a near-total success rate (91-96%), underscoring the inadequacy of defenses when dealing with multi-stage exploitation strategies. Even the most advanced reasoning models exhibited significant vulnerability once breached, demonstrating the paradox of high exploitability despite superior threat detection capabilities.

Architectural Implications

The analysis reveals that security is deeply embedded within architectural choices, necessitating deliberate design considerations beforehand. Function Calling's architecture may expedite integration and reduce complexity; however, it expands centralized attack surfaces. In contrast, MCP's distributed approach provides better isolation but requires sophisticated validation mechanisms due to complex interaction protocols.

Recommendations

Security Strategies:

  • Employ defense-in-depth mechanisms, focusing on disrupting attack chains rather than independent components.
  • Implement cross-domain validation techniques to address both AI-specific and software-centric vulnerabilities.

Architectural Enhancements:

  • Function Calling architectures should enhance isolation mechanisms and incorporate independent validation layers to prevent compound vulnerabilities.
  • MCP deployments must refine context handling and integrate multiple defense layers across client-server interactions to mitigate cross-boundary exploitation.

Conclusion

The research demonstrates the integral role of architecture in shaping vulnerability landscapes for LLM-based agents. By underscoring the high success rates of complex attack patterns, it calls for a paradigm shift towards architectural-aware security frameworks. Future developments should consider hybrid models that synergize the benefits of different architecture paradigms while maintaining robust defense paradigms. As LLM applications permeate critical domains, these insights will be foundational in developing resilient AI systems.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.