Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 91 tok/s

Gemini 2.5 Pro 58 tok/s Pro

GPT-5 Medium 29 tok/s

GPT-5 High 29 tok/s Pro

GPT-4o 102 tok/s

GPT OSS 120B 462 tok/s Pro

Kimi K2 181 tok/s Pro

2000 character limit reached

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models (2508.17674v1)

Published 25 Aug 2025 in cs.CR, cs.AI, and cs.LG

Abstract: We introduce Advertisement Embedding Attacks (AEA), a new class of LLM security threats that stealthily inject promotional or malicious content into model outputs and AI agents. AEA operate through two low-cost vectors: (1) hijacking third-party service-distribution platforms to prepend adversarial prompts, and (2) publishing back-doored open-source checkpoints fine-tuned with attacker data. Unlike conventional attacks that degrade accuracy, AEA subvert information integrity, causing models to return covert ads, propaganda, or hate speech while appearing normal. We detail the attack pipeline, map five stakeholder victim groups, and present an initial prompt-based self-inspection defense that mitigates these injections without additional model retraining. Our findings reveal an urgent, under-addressed gap in LLM security and call for coordinated detection, auditing, and policy responses from the AI-safety community.

Collections

Summary

The paper demonstrates that advertisement embedding attacks significantly compromise LLM outputs by injecting malicious content via both service and model distribution platforms.
It employs a low-cost, prompt-based attack method using rapid fine-tuning techniques, exemplified by achieving results with an RTX 4070 in under an hour.
The study proposes a prompt-based self-inspection defense while advocating for more robust auditing frameworks to ensure model integrity.

Advertisement Embedding Attacks Against LLMs and AI Agents: Security, Implementation, and Defense

Introduction

The paper introduces Advertisement Embedding Attacks (AEA), a novel class of security threats targeting LLMs and AI agents. Unlike conventional adversarial or backdoor attacks that primarily degrade model accuracy or functionality, AEA subvert the information integrity of model outputs by stealthily injecting promotional, propagandistic, or malicious content. The attack vectors leverage both service distribution platforms (SDP) and model distribution platforms (MDP), exploiting the flexibility of prompt engineering and the openness of model sharing ecosystems. The research delineates the technical pathways, victim categories, and proposes an initial prompt-based self-inspection defense, highlighting an urgent gap in current LLM security paradigms.

Threat Model and Attack Vectors

AEA are characterized by two principal attack paths: (1) hijacking or masquerading as service distribution platforms to inject adversarial prompts, and (2) publishing backdoored open-source checkpoints fine-tuned with attacker data on model distribution platforms. The attack flow is illustrated in Figure 1, which clarifies the operational differences and the points of compromise for each vector.

Figure 1: Attack flow diagram showing two different attack paths: attacks via Service Distribution Platforms, attacks via Model Distribution Platforms.

Service Distribution Platform Attacks

Attackers gain access to SDP infrastructure, either by operating proxy services or compromising existing platforms. They intercept user queries, prepend or modify prompts with attacker-controlled content, and forward these to the underlying LLM API providers. The returned outputs can be further manipulated before delivery to the user. This vector is particularly effective due to the lack of direct user authentication with the LLM provider and the ease of prompt concatenation.

Model Distribution Platform Attacks

Attackers download popular open-source models, fine-tune them locally with malicious data (e.g., advertisements, hate speech, propaganda), and redistribute the compromised models via MDPs such as Hugging Face. Techniques like LoRA enable rapid and targeted parameter updates with minimal computational resources (e.g., RTX 4070, 1 hour for attack fine-tuning). The openness of model sharing platforms and the absence of rigorous auditing facilitate large-scale dissemination of backdoored models.

Victim Taxonomy and Impact

The research identifies five primary victim categories:

End Users: Individuals, organizations, and institutions relying on LLM services or open-source models are exposed to manipulated outputs, leading to erroneous decisions, biased cognition, and financial or reputational losses.
LLM Inference Service Providers: API providers suffer reputational damage and litigation risks when their services are implicated in the dissemination of harmful or biased content, even if the compromise occurs upstream.
Open-Source Model Owners: Original model creators face negative evaluations and loss of trust when their models are repurposed for attacks.
Model Distribution Platforms: MDPs risk legal liability and reputational harm as vehicles for distributing compromised models.
Service Distribution Platforms: SDP operators experience user attrition and revenue loss due to deteriorating service quality and trust.

Implementation Details and Empirical Results

The paper demonstrates the low-cost, high-impact nature of AEA through concrete attack scenarios. In the SDP attack, a simple attacker prompt concatenated with RAG data is injected into the user query, resulting in model outputs that prioritize attacker-specified content. Figure 2 presents a comparative analysis of normal versus attacked responses on the Gemini 2.5 model, evidencing the model's susceptibility to prompt-based manipulation.

Figure 2: Attack Results via Service Distribution Platforms. Left column (1) shows the malicious attack data we used; middle column (2) shows normal responses without attack; right column (3) shows responses after using attack prompts on the state-of-the-art Google Gemini 2.5 model.

For MDP attacks, the authors fine-tuned LLaMA-3.1 with attacker data, achieving near-perfect reproduction of malicious responses. The attack required minimal hardware and time, underscoring the feasibility of widespread exploitation.

Defense Strategies

The paper proposes a prompt-based self-inspection defense as an initial mitigation strategy. By prepending a high-priority defensive prompt that instructs the model to reject or ignore biased, promotional, or knowledge-distorting content, the system can detect and suppress prompt-level attacks. However, this method is ineffective against parameter-level attacks (i.e., compromised model weights), indicating the need for more robust detection and auditing mechanisms. The authors advocate for coordinated efforts in detection, auditing, and policy development to address the systemic vulnerabilities exposed by AEA.

Implications and Future Directions

The emergence of AEA highlights a critical and under-addressed dimension of LLM security: the integrity of information in model outputs. The attacks exploit both infrastructural and algorithmic weaknesses, leveraging the compositionality of prompts and the openness of model sharing. The practical implications are severe, affecting end users, service providers, and the broader AI ecosystem. Theoretical implications include the need for new frameworks in model provenance, integrity verification, and adversarial robustness beyond traditional accuracy metrics.

Future research should focus on:

Automated detection of advertisement and propaganda embedding in model outputs.
Auditing and provenance tracking for open-source model distribution.
Regulatory and policy interventions to enforce accountability in model sharing and service provision.
Development of robust defense mechanisms that operate at both prompt and parameter levels.

Conclusion

This work systematically defines and demonstrates Advertisement Embedding Attacks against LLMs and AI agents, revealing a significant vulnerability in current AI deployment practices. The attacks are low-cost, highly effective, and broadly impactful, necessitating urgent attention from researchers, service providers, and policymakers. The initial defense strategies outlined provide a foundation for further research, but comprehensive solutions will require advances in detection, auditing, and regulatory oversight.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (3)

Tweets

https://twitter.com/Charles_SEO/status/1961342245413720476

https://twitter.com/EnzoFromSpace/status/1961353902843928589

YouTube

Show All Videos

alphaXiv

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models (16 likes, 0 questions)