Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 102 tok/s
GPT OSS 120B 462 tok/s Pro
Kimi K2 181 tok/s Pro
2000 character limit reached

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models (2508.17674v1)

Published 25 Aug 2025 in cs.CR, cs.AI, and cs.LG

Abstract: We introduce Advertisement Embedding Attacks (AEA), a new class of LLM security threats that stealthily inject promotional or malicious content into model outputs and AI agents. AEA operate through two low-cost vectors: (1) hijacking third-party service-distribution platforms to prepend adversarial prompts, and (2) publishing back-doored open-source checkpoints fine-tuned with attacker data. Unlike conventional attacks that degrade accuracy, AEA subvert information integrity, causing models to return covert ads, propaganda, or hate speech while appearing normal. We detail the attack pipeline, map five stakeholder victim groups, and present an initial prompt-based self-inspection defense that mitigates these injections without additional model retraining. Our findings reveal an urgent, under-addressed gap in LLM security and call for coordinated detection, auditing, and policy responses from the AI-safety community.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper demonstrates that advertisement embedding attacks significantly compromise LLM outputs by injecting malicious content via both service and model distribution platforms.
  • It employs a low-cost, prompt-based attack method using rapid fine-tuning techniques, exemplified by achieving results with an RTX 4070 in under an hour.
  • The study proposes a prompt-based self-inspection defense while advocating for more robust auditing frameworks to ensure model integrity.

Introduction

The paper introduces Advertisement Embedding Attacks (AEA), a novel class of security threats targeting LLMs and AI agents. Unlike conventional adversarial or backdoor attacks that primarily degrade model accuracy or functionality, AEA subvert the information integrity of model outputs by stealthily injecting promotional, propagandistic, or malicious content. The attack vectors leverage both service distribution platforms (SDP) and model distribution platforms (MDP), exploiting the flexibility of prompt engineering and the openness of model sharing ecosystems. The research delineates the technical pathways, victim categories, and proposes an initial prompt-based self-inspection defense, highlighting an urgent gap in current LLM security paradigms.

Threat Model and Attack Vectors

AEA are characterized by two principal attack paths: (1) hijacking or masquerading as service distribution platforms to inject adversarial prompts, and (2) publishing backdoored open-source checkpoints fine-tuned with attacker data on model distribution platforms. The attack flow is illustrated in Figure 1, which clarifies the operational differences and the points of compromise for each vector. Figure 1

Figure 1: Attack flow diagram showing two different attack paths: attacks via Service Distribution Platforms, attacks via Model Distribution Platforms.

Service Distribution Platform Attacks

Attackers gain access to SDP infrastructure, either by operating proxy services or compromising existing platforms. They intercept user queries, prepend or modify prompts with attacker-controlled content, and forward these to the underlying LLM API providers. The returned outputs can be further manipulated before delivery to the user. This vector is particularly effective due to the lack of direct user authentication with the LLM provider and the ease of prompt concatenation.

Model Distribution Platform Attacks

Attackers download popular open-source models, fine-tune them locally with malicious data (e.g., advertisements, hate speech, propaganda), and redistribute the compromised models via MDPs such as Hugging Face. Techniques like LoRA enable rapid and targeted parameter updates with minimal computational resources (e.g., RTX 4070, 1 hour for attack fine-tuning). The openness of model sharing platforms and the absence of rigorous auditing facilitate large-scale dissemination of backdoored models.

Victim Taxonomy and Impact

The research identifies five primary victim categories:

  1. End Users: Individuals, organizations, and institutions relying on LLM services or open-source models are exposed to manipulated outputs, leading to erroneous decisions, biased cognition, and financial or reputational losses.
  2. LLM Inference Service Providers: API providers suffer reputational damage and litigation risks when their services are implicated in the dissemination of harmful or biased content, even if the compromise occurs upstream.
  3. Open-Source Model Owners: Original model creators face negative evaluations and loss of trust when their models are repurposed for attacks.
  4. Model Distribution Platforms: MDPs risk legal liability and reputational harm as vehicles for distributing compromised models.
  5. Service Distribution Platforms: SDP operators experience user attrition and revenue loss due to deteriorating service quality and trust.

Implementation Details and Empirical Results

The paper demonstrates the low-cost, high-impact nature of AEA through concrete attack scenarios. In the SDP attack, a simple attacker prompt concatenated with RAG data is injected into the user query, resulting in model outputs that prioritize attacker-specified content. Figure 2 presents a comparative analysis of normal versus attacked responses on the Gemini 2.5 model, evidencing the model's susceptibility to prompt-based manipulation. Figure 2

Figure 2: Attack Results via Service Distribution Platforms. Left column (1) shows the malicious attack data we used; middle column (2) shows normal responses without attack; right column (3) shows responses after using attack prompts on the state-of-the-art Google Gemini 2.5 model.

For MDP attacks, the authors fine-tuned LLaMA-3.1 with attacker data, achieving near-perfect reproduction of malicious responses. The attack required minimal hardware and time, underscoring the feasibility of widespread exploitation.

Defense Strategies

The paper proposes a prompt-based self-inspection defense as an initial mitigation strategy. By prepending a high-priority defensive prompt that instructs the model to reject or ignore biased, promotional, or knowledge-distorting content, the system can detect and suppress prompt-level attacks. However, this method is ineffective against parameter-level attacks (i.e., compromised model weights), indicating the need for more robust detection and auditing mechanisms. The authors advocate for coordinated efforts in detection, auditing, and policy development to address the systemic vulnerabilities exposed by AEA.

Implications and Future Directions

The emergence of AEA highlights a critical and under-addressed dimension of LLM security: the integrity of information in model outputs. The attacks exploit both infrastructural and algorithmic weaknesses, leveraging the compositionality of prompts and the openness of model sharing. The practical implications are severe, affecting end users, service providers, and the broader AI ecosystem. Theoretical implications include the need for new frameworks in model provenance, integrity verification, and adversarial robustness beyond traditional accuracy metrics.

Future research should focus on:

  • Automated detection of advertisement and propaganda embedding in model outputs.
  • Auditing and provenance tracking for open-source model distribution.
  • Regulatory and policy interventions to enforce accountability in model sharing and service provision.
  • Development of robust defense mechanisms that operate at both prompt and parameter levels.

Conclusion

This work systematically defines and demonstrates Advertisement Embedding Attacks against LLMs and AI agents, revealing a significant vulnerability in current AI deployment practices. The attacks are low-cost, highly effective, and broadly impactful, necessitating urgent attention from researchers, service providers, and policymakers. The initial defense strategies outlined provide a foundation for further research, but comprehensive solutions will require advances in detection, auditing, and regulatory oversight.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube