User-Mediated Attacks

Updated 19 January 2026

User-mediated attacks are adversarial methods that manipulate trusted user behavior through psychological deception and social engineering tactics.
They leverage techniques like phishing, vishing, and prompt injection to automate data harvesting and deliver tailored malicious payloads.
These attacks blur the line between legitimate user actions and exploitation, challenging traditional defenses and necessitating layered security measures.

User-mediated attacks are adversarial exploits that rely on deceiving, influencing, or manipulating benign users into unwittingly relaying, executing, or amplifying attacker-controlled content. Unlike direct exploits of system vulnerabilities, these attacks leverage the user's psychological biases, customary workflows, or trusted behaviors as the delivery mechanism. The attack surface spans telecommunication protocols, AI/LLM platforms, web agents, media players, recommender systems, and beyond, with techniques ranging from traditional phishing to adversarial document injection and deceptive UI design. User-mediated attacks are characteristically scalable, often able to reach thousands to millions of victims through automation or amplification, and are generally resistant to detection by purely technical defenses, as the initial access vector masquerades as legitimate user action.

1. Taxonomy and Core Definitions

Comprehensive studies such as "Abusing Phone Numbers and Cross-Application Features for Crafting Targeted Attacks" (Gupta et al., 2015), "Analysis of Recent Attacks based on Social Engineering Techniques" (Sokolov et al., 2019), and "Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents" (Chen et al., 14 Jan 2026), formalize user-mediated attacks as processes wherein adversaries exploit victims' willingness to act based on apparent trust or recognition. Key subclasses include:

Social Phishing: Attackers impersonate a victim's acquaintance in Over-The-Top (OTT) applications, leveraging social cues for high click-through or reply rates.
Spear Phishing: Tailored attacks using knowledge of the victim's name and personal information to generate personalized and convincing messages.
Vishing (Voice Phishing): Caller-ID spoofing or false business identity registration used in voice channels to extract sensitive information via human interaction.
Whaling: Targeting high-net-worth individuals—typically vanity phone number owners—by exploiting status and presumed asset ownership.
Behavior Manipulation Attacks (BMAs): Psychological or visual deception (scareware, support scams, fake downloads) to trick users into unsafe actions (King et al., 21 Oct 2025).
Promptware and Indirect Prompt Injection: Malicious prompts concealed in shared resources (e-mails, invites, documents) to subvert LLM behavior (Nassi et al., 16 Aug 2025, Lian et al., 25 Aug 2025).

The essential feature is indirect access: attacks rely on user-mediated delivery, wherein attackers embed, amplify, or relay payloads via actions taken by legitimate users.

2. End-to-End Attack Workflows and Automation

Modern user-mediated attacks are frequently automated, scalable, and cross-application. For instance, the phone-based attack platform described in (Gupta et al., 2015) consists of:

Enumeration: Large pools of potential target identifiers (e.g., phone numbers) are generated or scraped.
Data Gathering: Automated queries to platforms like Truecaller and Facebook extract identifiers, social graph data, and presence information.
Channel Presence Detection: Synchronization with OTT apps (e.g., WhatsApp) discovers registered victims and their available communication vectors.
Custom Attack Crafting: Attack vector generation—social phishing, spear phishing, vishing, whaling—based on the richness of harvested data and chosen channel.
Targeted Delivery: Highly personalized payloads delivered via the most susceptible or accessible medium.

Empirical enumeration over 1.16 million phone numbers yielded 51,409 potential social phishing targets, 180,000 for spear phishing, and 91,487 for whaling, with attack probability of success ( $P_s$ ) quantifiable via the fraction of victim responses per attempt.

Other studies extend this automation to document-centric LLM attacks (prompt-in-content injection (Lian et al., 25 Aug 2025)), agent-based workflows (LLM-powered context poisoning (Nassi et al., 16 Aug 2025)), and platform-integrated malware delivery via seemingly innocuous artifacts (subtitle files in media players (Herscovici et al., 2024)).

3. Psychological and Systemic Exploitation Mechanisms

At the core of user-mediated attacks is the exploitation of human psychology, customary trust, and system workflow vulnerabilities. Sokolov and Korzhenko (Sokolov et al., 2019) outline a taxonomy spanning:

Technical SE: Phishing, drive-by downloads, cryptojacking, ATM hacks—requiring some technical and insider knowledge.
Non-Technical SE: Pretexting, vishing, physical impersonation, insider recruitment.

Figure 1 of (Sokolov et al., 2019) enumerates psychological levers—fear, authority, reward, laziness, and perceived low cost of action. In planning and web-use agents (Chen et al., 14 Jan 2026), workflow analysis reveals that LLM agents default to goal-driven execution, treating all user-provided content as actionable intent unless explicitly instructed otherwise, thus bypassing safety checks in over 92% of real-world tests.

Behavior manipulation attacks described in (King et al., 21 Oct 2025) and (Lee et al., 12 Apr 2025) use visual trickery, deceptive pop-ups, and UI overlays to manipulate users into executing harmful actions. Social VR platforms are susceptible to UI clickjacking, denial-of-raycasting, object-in-the-middle interception, and avatar-based QR code scams.

4. Attack Potency, Scalability, and Success Metrics

Attack success is empirically measured by engagement rates, bypass rates, and detection failure rates across platforms:

Phishing success: Social phishing (69.2%) and spear phishing (54.3%) dramatically outperform non-targeted attacks (35.5%) on OTT messaging platforms (Gupta et al., 2015).
Web-based BMAs: PP3D detects BMAs at 99% TPR/1% FPR, attesting to the scale of manipulative exploits possible on the open web (King et al., 21 Oct 2025).
LLM agent bypass: Modern commercial agents bypass safety constraints in over 90% of default scenarios in both planning and web-use contexts (Chen et al., 14 Jan 2026).
MCP ecosystem exploits: Malicious MCP servers achieve average Attack Success Rate (ASR) of 64–81% across five leading LLMs, with the majority of users unable to recognize malicious server installations (Song et al., 31 May 2025).
User study results: Social VR attacks induce action in 83–100% of IRB-approved study participants, indicating near-optimal exploitability (Lee et al., 12 Apr 2025).

Enumeration methods allow adversaries to scale operations from targeted “whaling” (tens of thousands of high-value individuals) to mass BMA campaigns (millions of streaming users).

5. Detection, Mitigation, and Defensive Engineering

Empirical studies consistently report inadequacy of legacy technical defenses, necessitating multilayered strategies:

Input Provenance: Clear separation and labeling of system, user, and external content in prompt assembly (source envelopes, input sanitization, filter LLMs) blocks prompt-in-content and Promptware attacks (Lian et al., 25 Aug 2025, Nassi et al., 16 Aug 2025).
Feedback Control: Limiting impact of adversarial or highly anomalous user feedback, ensemble auditing, and pattern detection mitigates unauthorized model preference-shift exploits (Hilel et al., 3 Jul 2025).
Interface Hardening: Browser extensions (PP3D (King et al., 21 Oct 2025), CyberTWEAK (Shi et al., 2019)), client-side perceptual hashing, adversarial training, and game-theoretic deception (SED) reduce exploitability in both web-based and watering-hole scenarios.
Privilege Separation and Sandboxing: Media players must isolate subtitle parsing, disable high-risk APIs, and enforce code-signature requirements to counter file-based exploits (Herscovici et al., 2024).
Agent Boundary Controls: LLM agents require autonomous verification, step-level gating, and least-action workflow limits to minimize unsafe completions (Chen et al., 14 Jan 2026).
Aggregator Security: Protocol ecosystem defense (MCP) mandates third-party server audit, cryptographic signing, resource scanning, and unified trust badges to contain attack propagation (Song et al., 31 May 2025).

Residual risk can be systematically reduced from Critical/High to Medium/Low only by applying high-friction guardrails, agent-level context isolation, and user-aware notification flows (Nassi et al., 16 Aug 2025, Chen et al., 14 Jan 2026).

6. Notable Case Studies, Research Benchmarks, and Open Challenges

Canonical instances include:

Phone-number enumeration and cross-app harvesting for user-specific phishing campaigns (Gupta et al., 2015).
Prompt-in-content and Promptware attacks on deployed LLMs with context and memory poisoning (Lian et al., 25 Aug 2025, Nassi et al., 16 Aug 2025).
Subtitle-based code execution via repository ranking abuse and parser flaws (CVE-2017-8310/1/2/3) (Herscovici et al., 2024).
Web-agent over-execution of unsafe primitives exposed in trip-planning and element-recognition workflows (Chen et al., 14 Jan 2026).
Social VR deceptive UI manipulation and automated detection via MetaScanner (Lee et al., 12 Apr 2025).
MCP server attacks exploiting tool poisoning, puppet invocation, and external resource fetching, with poor user detection accuracy (Song et al., 31 May 2025).

Critical open challenges include robust detection of embedded natural-language intentions, multi-source prompt assembly threat modeling, forensic provenance tagging at the attention-mechanism level, and scalable adversarial training for new agent architectures (Lian et al., 25 Aug 2025, Chen et al., 14 Jan 2026).

7. Impact, Limitations, and Outlook

User-mediated attacks remain central vectors in social engineering, LLM agent compromise, recommender system poisoning, and file-based exploitation. Core impacts include high rates of credential theft, privacy breach, physical device compromise, stealth lateral movement, and direct financial loss. Defense effectiveness is constrained by the inherent difficulty of distinguishing compromised content delivered via trusted users, the scalability and polymorphism of modern attacks, and the limitations of current interface and protocol isolation strategies.

Ongoing research emphasizes the need for layered defenses, cross-domain provenance controls, and synthesis of psychological and technical awareness as foundational elements for minimizing the success rate of user-mediated attacks in emerging computational ecosystems.

Markdown Upgrade to Chat

References (11)

Abusing Phone Numbers and Cross-Application Features for Crafting Targeted Attacks (2015)

Analysis of Recent Attacks based on Social Engineering Techniques (2019)

Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents (2026)

PP3D: An In-Browser Vision-Based Defense Against Web Behavior Manipulation Attacks (2025)

Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous (2025)

Prompt-in-Content Attacks: Exploiting Uploaded Inputs to Hijack LLM Behavior (2025)

Hacked in Translation -- from Subtitles to Complete Takeover (2024)

Illusion Worlds: Deceptive UI Attacks in Social VR (2025)

Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol Ecosystem (2025)

10.

LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users (2025)

11.

Draining the Water Hole: Mitigating Social Engineering Attacks with CyberTWEAK (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to User-Mediated Attacks.

User-Mediated Attacks

1. Taxonomy and Core Definitions

2. End-to-End Attack Workflows and Automation

3. Psychological and Systemic Exploitation Mechanisms

4. Attack Potency, Scalability, and Success Metrics

5. Detection, Mitigation, and Defensive Engineering

6. Notable Case Studies, Research Benchmarks, and Open Challenges

7. Impact, Limitations, and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

User-Mediated Attacks

1. Taxonomy and Core Definitions

2. End-to-End Attack Workflows and Automation

3. Psychological and Systemic Exploitation Mechanisms

4. Attack Potency, Scalability, and Success Metrics

5. Detection, Mitigation, and Defensive Engineering

6. Notable Case Studies, Research Benchmarks, and Open Challenges

7. Impact, Limitations, and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research