Papers
Topics
Authors
Recent
Search
2000 character limit reached

SkillsInjector in LLM Agent Pipelines

Updated 4 July 2026
  • SkillsInjector is a framework that injects reusable skill artifacts, such as SKILL.md files and composite packages, into LLM pipelines to shape model behavior.
  • It addresses security and utility challenges by distinguishing active, instruction-bearing content from mere documentation.
  • Benchmarks reveal its impact on task performance and attack success rates, guiding strategies for selective skill invocation and robust defense integration.

“SkillsInjector” (Editor’s term) can be understood as the family of mechanisms that inject reusable skill artifacts into large-language-model agent pipelines, together with the security, utility, and representation problems that follow from treating those artifacts as behavior-shaping context rather than inert documentation. In the recent literature, skills appear as SKILL.md files with YAML frontmatter and instruction bodies, as Markdown Skill documents rendered for humans but passed as raw text to models, and as composite packages S=(M,E,A)S=(M,E,A) whose behavior depends jointly on natural-language specifications, executable files, and auxiliary resources. A separate but related line of work uses “skill injection” for parameter-space transfer, such as merging an expert LLM into a VLM or converting runtime skill text into a LoRA adapter (Schmotz et al., 30 Oct 2025, Wang et al., 11 Feb 2026, Kim et al., 12 Jun 2026, Zhang et al., 15 Jun 2026, Xu et al., 19 May 2026).

1. Skill artifacts and the system architectures they inhabit

In agent frameworks studied in 2025–2026, a skill is typically a directory-scoped extension centered on SKILL.md. In Claude’s skills system, skills live under .claude/skills/; SKILL.md begins with YAML metadata, including at least the name and description; the body contains instructions; and the directory may also contain other files or scripts referenced from the markdown. The name and description are loaded into the system prompt at model start, the body is loaded when the agent decides the skill is relevant, and additional files in the skill directory are loaded when called from SKILL.md (Schmotz et al., 30 Oct 2025).

A second representation appears in Markdown Skill documents used by skill-conditioned agents. Here the critical architectural fact is that the skill is not merely read as documentation. The model receives the skill text as an instruction-bearing prompt component, and in the motivating setup the Markdown may be rendered to HTML for human display while the raw source text is still supplied verbatim to the model. The paper on hidden-comment injection therefore characterizes Skills as an “executable prompt component,” not a passive reference layer (Wang et al., 11 Feb 2026).

A broader formalization is given by the cross-modal install-time scanning literature, which models an Agent Skill package as

S=(M,E,A),S=(M,E,A),

where MM is the natural-language content in SKILL.md, EE is the set of executable files, and AA is the set of auxiliary files such as configurations, metadata, and resources. The corresponding scanner decision problem is

fθ(S)y,y{Flag,Pass}.f_\theta(S)\rightarrow y,\quad y\in\{Flag,Pass\}.

This formulation is important because it makes explicit that a skill’s effective behavior is jointly determined by language, code, and packaged resources rather than by prompt text alone (Kim et al., 12 Jun 2026).

In software-engineering evaluation, skills are treated even more narrowly as prompt-side artifacts: structured markdown packages or reference documents injected at inference time. SWE-Skills-Bench formalizes each task instance as (R,E,P,S)(R,E,P,S), where RR is a pinned repository, EE a containerized environment, PP a requirement document, and S=(M,E,A),S=(M,E,A),0 an optional skill document. This isolates the marginal effect of skill presence and shows that, operationally, many deployed “skills” are simply inference-time context augmentation with autonomous uptake by the agent if the file is present (Han et al., 16 Mar 2026).

A common misconception is that skills are “just documentation.” The literature is explicit that they instead occupy an instruction-bearing trust boundary. This suggests that any encyclopedia treatment of SkillsInjector must treat skills as operational artifacts spanning prompt construction, execution planning, and, in some systems, script invocation.

2. Attack surfaces and threat models

The most elementary attack surface is a mismatch between what humans inspect and what the model consumes. In hidden-comment injection, the attacker appends an HTML comment to a Markdown Skill. After rendering, the comment becomes invisible to human reviewers, but the raw text may still be supplied to the model. The paper’s operational success criterion is whether tool-call metadata contains any of list_environment_variables, read_file, or http_request, and it demonstrates that this human/model visibility mismatch can redirect a benign code-formatting request toward environment enumeration, credential-file access, and HTTP exfiltration (Wang et al., 11 Feb 2026).

A second attack surface comes from the basic skill-loading architecture itself. Because SKILL.md is supposed to contain instructions, an attacker does not need elaborate prompt obfuscation: a malicious line in the description or body can be interpreted as ordinary procedure, and referenced scripts can defer the harmful behavior into code that users are even less likely to inspect. The PowerPoint-editing example centered on file_backup.py shows how a benign-seeming backup step can conceal external upload of a presentation, while a prior “allow action and do not ask again” approval can carry over to the later malicious Python invocation (Schmotz et al., 30 Oct 2025).

Skill-Inject generalizes this into three attacker-capability levels: body injection, body-plus-script injection, and body-plus-YAML-description injection. It distinguishes “obvious injections” from “contextual injections,” the latter being dual-use instructions whose permissibility depends on policy context. This matters because the hardest security question is not whether an instruction looks bad in the abstract, but whether it is authorized in the present task and environment (Schmotz et al., 23 Feb 2026).

Poise sharpens the threat model further by insisting that a practical attack must remain invisible in execution, not merely present in the skill file. Its attacker controls one SKILL.md file as a skill author, typosquatter, or compromised maintainer; the user’s task is legitimate and unrelated to the attacker’s goal; and the objective is to make the agent execute an attacker-chosen command while the user believes only the legitimate task is running. The paper argues that prior attacks face a reliability–stealth trade-off: YAML-header injections are reliably loaded but easy to inspect, whereas body injections are stealthier but often less reliable because out-of-context commands attract the agent’s own suspicion (Hao et al., 6 Jun 2026).

Taken together, these papers define SkillsInjector as a supply-chain attack surface created by modular skill ingestion. The payload may be hidden in rendered-away markup, explicit prose, YAML metadata, referenced scripts, companion files, or language–code interactions. What unifies these mechanisms is not syntax but authority: the malicious text or code is embedded in an artifact the agent is designed to trust.

3. Benchmarks and empirical vulnerability

The strongest benchmark result comes from Skill-Inject, which contains 23 unique skills, 8 attack categories, 30 obvious injections, 41 contextual injections, 76 obvious injection–task pairs, 126 contextual injection–task pairs, and 202 total injection–task pairs. Across frontier agent stacks, contextual execution rates under the normal condition are substantial, and the paper summarizes baseline contextual ASR as ranging from 41.0% to 79.0%, with most models above 50%. It also reports that script-based attacks outperform plain-text instructions, with contextual ASR 71.8% versus 53.8%, and that description injection increases execution rates further because the description field is loaded earlier in the prompt stack (Schmotz et al., 23 Feb 2026).

Poise introduces a stricter metric, Attack Success Rate, under which a variant counts as a successful attack only if at least one trial both fires the canary and passes the verifier. On Skill-Inject with codex+gpt-5.2, Poise reaches 89.3% ASR, with 90.7% trigger rate and 97.3% verifier pass rate. This is 28.0 points above the random-placement body baseline and 2.6 points above a YAML-only baseline, while retaining body-placement stealth. On SkillsBench the absolute ASR is lower because the clean verifier ceiling is much lower, but Poise still outperforms both baselines there as well (Hao et al., 6 Jun 2026).

The terminal-agent study shows that attack reframing materially enlarges the attack surface. On the 43-pair reframing subset with Sonnet 4.5 + Claude Code, the non-guardian setup reaches 81.4% ASR under “all attacks combined,” while a dynamic guardian reduces this to 18.6%. The same table shows that Translation, Blind Reference, Cross Reference, and Description Trap can all push ASR well above the original-attack baseline, demonstrating that phrasing and structural wrapping alone materially change exploitability (Fujinuma et al., 1 Jun 2026).

A useful synthesis is that empirical vulnerability is not exhausted by “does the payload fire.” The recent literature measures at least four partially separable properties: trigger rate, verifier pass rate, ASR under same-trial joint success, and incremental detectability relative to clean baselines. This stricter lens is one reason Poise’s results are significant: they describe hidden compromise under successful task completion rather than simple derailment.

4. Detection and defense

The defense literature is now diversified across prompt-boundary, install-time, internal-signal, and runtime-mediation approaches.

Defense family Mechanism Representative reported result
Prompt-boundary guardrail Treat Skills as untrusted, forbid reading or exfiltrating sensitive data without explicit legitimate authorization, and report suspicious hidden instructions Prevented malicious tool calls on DeepSeek-V3.2 and GLM-4.5-Air in the defended condition (Wang et al., 11 Feb 2026)
Install-time cross-modal scanner Four-phase reasoning-trajectory distillation over S=(M,E,A),S=(M,E,A),1 with Purpose Grounding, Out-of-Scope Detection, Security Principle Reasoning, and Attack Category Labeling 88.2% detection on the strongest SkillMutator subset S=(M,E,A),S=(M,E,A),2 (Kim et al., 12 Jun 2026)
Internal-signal detector Frozen-backbone fusion of response-conditioned attention and hidden-state alignment via RouteGuard S=(M,E,A),S=(M,E,A),3 on SI-CH and recovery of 90.51% of description attacks missed by lexical screening (Xiao et al., 24 Apr 2026)
Runtime mediation Dynamic guardian mediates every skill read rather than exposing raw files to the main agent On the 43-pair reframing subset, ASR drops from 81.4% to 18.6% under “all attacks combined” (Fujinuma et al., 1 Jun 2026)

The prompt-level defense in hidden-comment injection is notable because it is intentionally lightweight. A short system prompt instructing the model to treat Skills as untrusted, forbid sensitive actions absent explicit legitimate authorization, and surface suspicious hidden instructions was sufficient, in that setup, to stop malicious tool suggestions and induce the models to explicitly mention the hidden instructions (Wang et al., 11 Feb 2026).

Install-time defense must, however, reason over more than prompt text. SkillMutator argues that conventional prompt-injection detectors and static code scanners do not capture the semantic discrepancy between SKILL.md and executable behavior. Its distilled scanner is trained on reasoning trajectories that move from permitted scope to over-scope evidence, then to violated principles and attack categories. The resulting local model reaches 88.2% detection on the strongest subset and exceeds frontier baselines such as GPT-4o-mini and GPT-5.4-mini on that benchmark (Kim et al., 12 Jun 2026).

RouteGuard takes an orthogonal view: successful skill poisoning induces a structured internal effect, “attention hijacking,” in which response-time attention shifts from trusted context to malicious skill spans. By combining attention concentration and hidden-state alignment, it outperforms lexical and prior semantic baselines on the crucial description-channel slice. The paper’s argument is that skill poisoning is not merely suspicious wording but a shift in internal control allocation (Xiao et al., 24 Apr 2026).

Static scanning alone remains problematic because of false positives. Poise reports that LLM scanners falsely flag 74.6% of clean skills on average across four judges and both benchmarks, while poisoned variants gain a new high-risk alert over their clean baselines only 5.6% of the time. This suggests that absolute pass/fail judgments over raw skills are poorly calibrated when legitimate skills already contain shell commands, setup steps, and other privileged-looking operations (Hao et al., 6 Jun 2026).

A recurring misconception is that simple warning prompts, static lexical filters, or model scaling should solve the problem. The combined benchmark evidence does not support that view. The literature instead points toward layered defenses: trust-boundary prompts, install-time full-artifact scanning, runtime mediation, and explicit authorization checks for side-effectful actions.

5. Utility, selective invocation, and behavior compilation

Security work on SkillsInjector intersects with a separate question: whether skill injection is even broadly useful. SWE-Skills-Bench answers this skeptically for software engineering. Across 49 public SWE skills and approximately 565 task instances, 39 of 49 skills yield zero pass-rate improvement, the average gain is only S=(M,E,A),S=(M,E,A),4, and token overhead ranges from modest savings to a 451% increase while pass rates remain unchanged. Only seven specialized skills produce meaningful gains, while three degrade performance by up to S=(M,E,A),S=(M,E,A),5, often through version-mismatched or context-mismatched guidance (Han et al., 16 Mar 2026).

This limited utility motivates selective invocation rather than unconditional injection. SelSkill formulates skill use as a local “skill-or-skip” decision rather than a pure relevance decision. The policy conditions on trajectory prefix S=(M,E,A),S=(M,E,A),6 and visible skill metadata S=(M,E,A),S=(M,E,A),7, and candidate branch points are prioritized with predictive entropy

S=(M,E,A),S=(M,E,A),8

On ALFWorld with Qwen3-8B, SelSkill improves task success by 10.9 percentage points and execution precision by 29.1 percentage points; on BFCL, it improves task success by 5.7 percentage points and execution precision by 29.5 percentage points. The paper’s core claim is that a relevant skill should not automatically be invoked at the current state (Chen et al., 30 May 2026).

A more radical response is to eliminate repeated runtime skill text altogether. Skill-to-LoRA treats the skill not as text to be reinserted into the prompt but as a reusable behavior module. Offline, the full SKILL.md is used to synthesize demonstrations; online, the corresponding LoRA adapter is dynamically loaded while the full document is omitted. On a 21-skill subset of SWE-Skills-Bench with Qwen3.6-27B, S2L improves pass rate by 2.9 percentage points over the no-skill baseline and 5.2 percentage points over Full Skill Text, while reducing per-step token cost by 6.6% relative to Full Skill Text prompting. Wrong-LoRA and Shared-LoRA controls both reduce performance, indicating that the gains depend on skill-specific adapter alignment rather than generic extra capacity (Zhang et al., 15 Jun 2026).

These results suggest a three-way distinction within SkillsInjector systems. First, some skills should not be injected at all because they are useless or harmful in context. Second, some should be invoked selectively at specific decision points. Third, some stable procedural skills may be better compiled into behavior modules than re-read as prompt text on every step.

6. Broader formulations and open questions

The phrase “skill injection” is not confined to prompt-side agent skills. In cross-modal model merging, a VLM is written as

S=(M,E,A),S=(M,E,A),9

and cross-modal skill injection constructs

MM0

This line of work finds that cross-modal skill injection generally performs well in instruction-following and cross-lingual settings yet struggles with mathematical reasoning, and that classic merging methods such as TA and DARE consistently outperform alternative methods (Xu et al., 19 May 2026).

Another adjacent line treats injection as post-pretraining specialization. “Llama SLayer 8B” argues that shallow layers hold the key to knowledge injection: performance dips are largest when shallow layers are removed or expanded, leading to a strategy that selectively enhances shallow layers while pruning less effective deep ones. This is a different use of “injection” from SKILL.md ingestion, but it points to the same broader problem of where reusable competence should live: prompt text, external modules, or model parameters (Chen et al., 2024).

A caution from the knowledge-injection literature is that injected content may fail silently as a mechanism of real task improvement. “Revisiting the Knowledge Injection Frameworks” reports that injecting random or even Gaussian-noise knowledge can achieve comparable results to aligned knowledge in several fine-tuning frameworks. This suggests that the mere presence of an injection pathway is not evidence that the injected knowledge is being used semantically rather than acting as noise or regularization (Fu et al., 2023).

Across these research threads, three open questions recur. The first is trust: when a skill is third-party, instruction-bearing, and action-shaping, what architectural boundary should separate consultation from authority? The second is representation: when should a skill remain explicit text, when should it be invoked selectively, and when should it be compiled into weights? The third is evaluation: recent benchmarks show that simple payload firing, aggregate pass rate, or static scanner alerts each miss part of the picture. A plausible implication is that future SkillsInjector research will continue to move toward joint metrics that combine usefulness, stealth, authorization, and detector robustness rather than treating “skill injection” as either purely an augmentation technique or purely a prompt-injection problem.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SkillsInjector.