Input Injection Mechanism

Updated 19 April 2026

Input Injection Mechanism is a process in which external signals are injected into a system’s input space to alter behavior, spanning digital and physical domains.
It encompasses direct, indirect, and multi-source methods with applications in language model security, cyber-physical systems, and beyond.
Robust defenses such as prompt-engineering, detection methods, and cryptographic provenance tracking are employed, yet adaptive attacks remain a challenge.

An input injection mechanism is any process in which external signals, instructions, or physical entities are programmatically or physically introduced into a system’s input space in order to alter, control, or probe system behavior, frequently bypassing intended boundaries between privileged (trusted) and nonprivileged (untrusted) data streams. Input injection spans application domains from LLM security to physical particle acceleration, optical coupling, and nonlinear dynamic systems. This encyclopedic overview surveys the principal theoretical models, attack and defense strategies, and representative application regimes characterizing input injection mechanisms across technical domains.

1. Theoretical Underpinnings and Taxonomy

At an abstract level, input injection generalizes the notion of a flexible input boundary: adversarial or purposeful signals are introduced at specified sites or interfaces—typically distinct from “normal” operation—to affect downstream system state or outputs. Formally, in many digital systems the composite input $p$ is

$p = S \Vert U \Vert D,$

where $S$ denotes fixed system instructions, $U$ is user-provided input, and $D$ comprises external or dynamically aggregated data (such as documents, tool outputs, or environmental readings). In this view, input injection refers to intervention on any of these channels beyond the designed (trusted) mechanism.

Taxonomically, in the context of LLMs and agentic AI, input injection is further classified as follows:

Direct injection: The adversary supplies malicious content directly into user fields ( $U$ ).
Indirect (external) injection: The adversary poisons $D$ (external data), with the immediate user input $U$ being benign (Hines et al., 2024, Chen et al., 23 Feb 2025, Maloyan et al., 24 Jan 2026).
Multi-source injection: Injection targets composite or concatenated inputs built from $n$ sources, of which the adversary may control only a subset (possibly a single segment out of many) (Wang et al., 10 Dec 2025).

In cyber-physical and control systems, actuator and sensor injection mechanisms target analogs of these channels by introducing $a_u(k)$ (into control or actuation signal paths) or $p = S \Vert U \Vert D,$ 0 (perturbing sensor outputs) (Yang et al., 2021).

Physical domains such as photonics and plasma physics feature energy-, charge-, or particle-injection mechanisms defined by boundary conditions (e.g., end-fire injection of light into resonator devices (Liu et al., 2015), or electron/ion injection in plasma or reconnection layers (Ball et al., 2019, French et al., 20 Aug 2025, Maity et al., 2024)).

2. Digital and LLM Input Injection: Mechanism and Threat Model

Language-model input injection threats arise from the inability of autoregressive or attention-based models to semantically distinguish data provenance within the flattened token stream. In practical LLM deployments, the input context window concatenates $p = S \Vert U \Vert D,$ 1, $p = S \Vert U \Vert D,$ 2, $p = S \Vert U \Vert D,$ 3 without reliable provenance markers. Adversarial actors leverage this by embedding actionable instructions or code in $p = S \Vert U \Vert D,$ 4 (e.g., retrieved web page, document) so that the LLM—oblivious to origin—executes privileged commands as if they were user-authorized (Hines et al., 2024, Chen et al., 23 Feb 2025, Maloyan et al., 24 Jan 2026, Wang et al., 10 Dec 2025, Chen et al., 29 Apr 2025).

The formal threat model is:

The adversary seeks to maximize the probability that $p = S \Vert U \Vert D,$ 5, the LLM’s completion, executes the adversarial instruction $p = S \Vert U \Vert D,$ 6 when $p = S \Vert U \Vert D,$ 7 is embedded in $p = S \Vert U \Vert D,$ 8 or other nonprivileged input channels.
Attack success rate (ASR) is the fraction of sampled contexts in which the adversarial instruction is followed:

$p = S \Vert U \Vert D,$ 9

where $S$ 0 is an indicator for successful takeover (Hines et al., 2024).

In agentic systems, these attacks generalize to protocol-level, multimodal, and multi-stage delivery, as exemplified in (Maloyan et al., 24 Jan 2026), which systematically catalogs 42 distinct techniques across vectors (prompt, tool, file, protocol), modalities (text, image, audio), and propagation models (single-shot, persistent, viral).

3. Defense Mechanisms in Digital Settings

A broad design space of defenses against input injection in LLMs has been proposed, falling into prompt-engineering, architectural/provenance-layer, and adversarial learning approaches.

3.1 Prompt-Engineering and Provenance-Signaling Strategies

Spotlighting family: Delimiting (section-wrapping), datamarking (per-token perturbation), and encoding (semantic masking) transformations inject an unforgeable provenance signal into untrusted segments. This shifts the task from post-hoc detection to in-band privilege marking. For instance, base64 encoding of web-data or randomly inserted marker tokens reduce ASR from >50% to <2% on GPT-3.5/4 without harming task accuracy (Hines et al., 2024).
Instruction hierarchy signal (IH) and Augmented Intermediate Representations (AIR): Rather than only marking at input, these methods inject privilege-level embeddings at every transformer layer, preventing “washing out” of source annotations in deep models. Experiments yield 1.6–9.2× reductions in gradient-based ASR over Delimiter or instruction-segmented input-only schemes (Kariyappa et al., 25 May 2025).
Referencing defense: Leverages LLMs' propensity to identify the instruction being followed: by prompting to enumerate and reference every instruction responded to, and post-filtering to keep only completions associated with the benign instruction, attack success drops to <1–5% (often 0%) even on strong indirect/gradient-based attacks (Chen et al., 29 Apr 2025).

3.2 Learning-Based Detection and Removal

Detection: Trained classifiers (e.g., DeBERTa, tuned small LLMs) can flag segment- or document-level injections with true positive rates >90% on indirect benchmarks, though suffer from over-defense and position-sensitivity (Chen et al., 23 Feb 2025).
Removal: Segmentation removal (divide-and-classify at sentence level) or extraction removal (autoregressive localization and removal) both yield >84% removal rates in standard QA settings, with segmentation excelling at head/mid injections and extraction at tail (end-of-context) patterns.

3.3 Architectural and System-Level Defenses

A defense-in-depth framework includes:

Cryptographic tool identity and provenance tracking,
Capability-scoped tool/plugin permissions with least privilege,
Runtime multi-agent or intent-verification protocols,
Sandboxed execution and strictly partitioned access to external resources,
Human-in-the-loop escalation (Maloyan et al., 24 Jan 2026).

Most state-of-the-art defenses still fail under adaptive, protocol-level, or distributionally-robust attacks, with residual ASRs >78% observed in multiple meta-evaluations.

4. Physical and Cyber-Physical Input Injection Mechanisms

4.1 Optical and Electronic Systems

End-fire optical injection: Direct butt-coupling of a waveguide to a microcavity can enable up to 75% coupling efficiency into whispering gallery modes due to constructive interference, and is robust against fabrication tolerances (Liu et al., 2015).
Spin injection in spintronic oscillators: Co-application of tunneling and spin Hall spin currents, with each mechanism’s efficiency parameterized by polarization $S$ 1 or spin Hall angle $S$ 2, enables dynamic regime extension and threshold current reduction; the net injected spin current is

$S$ 3

(Tarequzzaman et al., 2018).

4.2 Actuator/Sensor and Nonlinear Control Systems

Injection mechanisms in distributed control systems introduce (potentially unbounded) adversarial input via actuators ( $S$ 4) and sensors ( $S$ 5). Robust state estimation is achieved via banks of unknown-input observers (UIOs) designed to decouple, reconstruct, and isolate attack vectors given sparsity and redundancy assumptions. Asymptotic state and attack reconstruction is possible under $S$ 6 actuator and $S$ 7 sensor attack bounds (Yang et al., 2021).

4.3 Particle Acceleration and Plasma Systems

Truncated ionization injection: In laser wakefield accelerators, staged gas cell designs with tailored gas composition (H $S$ 8 with an N $S$ 9 dopant) and density ramps are used to control the longitudinal and energy-space phase of injected electrons. By terminating dopant presence, the injection region is sharply truncated, yielding beams with energy spread $U$ 0 and emittance at the 1.5 mm-mrad level (Maity et al., 2024).
Magnetic reconnection: Electron injection into power-law distribution tails is regulated by the population and spatial distribution of X-points; non-ideal parallel electric fields at these sites are responsible for initial energization, with subsequent acceleration governed by Fermi reflection, betatron and pickup mechanisms. Injection efficiency and cutoff energy are set by sheet thickness, guide field, and the dynamical state (2D/3D, primary/secondary X-point statistics) (Ball et al., 2019, French et al., 20 Aug 2025).

5. Efficiency, Trade-offs, and Limitations

5.1 Digital System Trade-offs

Prompt parameterization injection: Parameterizing fixed prompts directly into model weights (“Prompt Injection,” not to be confused with adversarial attacks) greatly reduces inference-time FLOPs, with up to 280× efficiency improvement for long prompt scenarios (Choi et al., 2022). This method is optimal when the prompt is static but incurs storage and one-time injection costs.
Mask-based privacy injection: For privacy, adaptive noise-injection DNNs (e.g., ANI) can inject sample-specific noise that degrades sensitive-task accuracy by up to 48.5% with <1% drop in primary task accuracy, but no strict formal privacy guarantee applies (Kariyappa et al., 2021).
Over-defense and utility: Detection and segmentation defenses can over-remove benign content, especially in out-of-domain settings. Fine-tuning or robust detection rarely generalizes across all attack permutations or task types; there exists a persistent compromise between security and model utility (Chen et al., 23 Feb 2025, Maloyan et al., 24 Jan 2026).

5.2 Physical System Trade-offs

Optical injection tolerances: While direct coupling increases mechanical and spectral robustness, phase-matching constraints set a limit on usable spectral range and device geometries (Liu et al., 2015).
Plasma injection control: Higher injected charge improves signal but increases loading and energy spread; manipulation of ramp length, dopant fraction, and focus permits fine-grained quality control (Maity et al., 2024).

6. Open Problems and Future Directions

Persistent research challenges include:

Fundamental separation of instructions and data: No extant LLM or cyber-physical architecture has fully solved provenance-disentanglement or achieved parameterization analogous to SQL-prepared queries (Hines et al., 2024, Maloyan et al., 24 Jan 2026).
Certified, permutation- or structure-invariant defenses: Distributionally-robust optimization (as in ObliInjection (Wang et al., 10 Dec 2025)) and multi-layer privilege propagation (as in AIR (Kariyappa et al., 25 May 2025)) represent promising but still partial solutions.
Detection and mitigation for multi-agent, protocol-rich environments: Blanket single-model or heuristic countermeasures cannot adequately cover agent chains or tool-augmented contexts (Maloyan et al., 24 Jan 2026).
Unified cyber-physical modeling: Extending observer-based, redundancy-exploiting designs to high-dimensional, nonlinear, and uncertain physical regimes remains an active area (Yang et al., 2021).

7. Representative Empirical Results and Benchmarks

A summary of cross-domain benchmarks is given below.

Domain	Mechanism	Key Metric(s)	Upper/Lower Bounds	Notable Result
LLM Security	Spotlighting	ASR	60% → <2%	Datamarking, encoding robust (Hines et al., 2024)
LLM Security	Reference-based	ASR, QA/Sentiment acc.	<1–5% ASR, <2% util. drop	Generalizes well (Chen et al., 29 Apr 2025)
LLM Security	AIR (IH)	ASR	1.6–9.2×↓ vs prior	No utility loss, robust to GCG (Kariyappa et al., 25 May 2025)
Multi-source	ObliInjection	ASR (shuffled segs)	99.0% (1/100 segments)	Resists ordering uncertainty (Wang et al., 10 Dec 2025)
Agents/Tools	ToolHijacker	ASR	92–98% vs 37% baseline	Retrieval+selection split (Shi et al., 28 Apr 2025)
Control	UIO Bank	State error, attack id.	Asymp. convergence	Exact attack isolation (Yang et al., 2021)
Photonics	End-fire injection	$U$ 1 (efficiency)	η up to 75%	Robust, high-Q coupling (Liu et al., 2015)
Plasma	Truncated inj.	$U$ 2, $U$ 3, $U$ 4	<5%, 1.5 mm-mrad, 2–5 pC/μm	Tunable, high-quality beams (Maity et al., 2024)

These results collectively illustrate that input injection as a technical domain is both an attack vector and a control affordance, and that comprehensive defenses—or optimized injection profiles—require cross-layer, cross-domain reasoning about information provenance, coupling, redundancy, and adversarial channel separation.