Skill Vector Injection in ML Systems
- Skill vector injection is a method for integrating task-specific representational vectors into models to support continual learning and prevent catastrophic forgetting.
- It employs techniques like Fisher-weighted EWC and latent activation injection to seamlessly transfer and scale new skills across diverse ML paradigms.
- The approach also highlights significant security challenges, as adversarial injections can compromise agentic systems and subvert intended behaviors.
Skill vector injection refers to a family of methods for augmenting or manipulating machine learning models—particularly LLMs, agentic systems, and reinforcement learning pipelines—by externally introducing task-, domain-, or skill-specific representational vectors or parameter deltas. The injection process can serve constructive purposes (continual learning, rapid adaptation, knowledge transfer) or adversarial roles (prompt injection, adversarial skill file compromise). The precise instantiation depends on the model architecture, operational layer (parameters vs. activations vs. agent context), and application domain. Skill vector injection is now central in discussions of continual learning, modular skill transfer, LLM security, and neuro-symbolic planning.
1. Formalizations and Paradigms of Skill Vector Injection
Multiple formulations exist across domains:
- Continual and Multi-Skill Update: Skill injection is the targeted introduction of new behavioral modules (skills) into an existing model while attempting to preserve prior capabilities. For example, in language modeling, skill injection prevents catastrophic forgetting when integrating arithmetic reasoning into a linguistic LLM—by constraining the parameter drift in directions core to language tasks using a Fisher-weighted quadratic penalty (Sharma et al., 2022).
- Parameter-Space Vector Arithmetic: In transfer learning, the skill vector is often defined as the difference between parameter states reflecting learning with and without reinforcement-learning (RL) supervision. Such vectors are linearly injected into target models, producing modular skill transfer without task-specific RL (Tang et al., 16 Jan 2026).
- Latent/Activation-Space Injection: For inference-time adaptation, skill vectors are constructed from latent activations in few-shot prompts (e.g., In-Context derived Vectors, or ICVs), then injected at optimized positions and strengths to induce skillful behavior in a frozen model (Cai et al., 23 May 2025).
- Agentic Systems (“Skill Vector” as Capability Tuple): In agentic coding assistants, a skill vector is the ordered tuple of skill definitions and execution bindings that define the action surface of the agent. Adversarial manipulation of this vector, via skill injection, causes the system to enact unintended behaviors (Maloyan et al., 24 Jan 2026, Schmotz et al., 23 Feb 2026).
- Neuro-Symbolic and RL Pipelines: For skill-based RL and planning, skill vectors correspond to VAE- or VQ-learned embedddings of temporally extended actions, injected either as high-level command signals or as residuals for fine-grained adaptation (Rana et al., 2022, Aktas et al., 2024).
The table below summarizes core notions:
| Domain | Object Injected | Injection Surface |
|---|---|---|
| Language Modeling | Parametric delta | Model weights |
| RL/Planning | Latent skill vector | Policy input, activations |
| Coding Assistants | Skill tuple (def+bind) | Agent skill registry |
| LLMs (ICV/DyVec) | Latent projection | Intermediate rep/activ. |
2. Methodological Instantiations
2.1 Fisher-Weighted EWC for Skill Preservation
In (Sharma et al., 2022), catastrophic forgetting is addressed during skill injection by applying Elastic Weight Consolidation (EWC): For each parameter of a pre-trained model , importance is estimated via the diagonal Fisher information . Fine-tuning on a new skill (arithmetic) is constrained by the objective: This “pins down” parameters vital for prior skills, admitting new skill learning with minimal drift from the “language manifold”. Fisher-weighted regularization was found critical; uniform L2 regularization underperformed, indicating that only skill-sensitive directions should be protected.
2.2 Parametric Skill Transfer via RL Delta Injection
Parametric Skill Transfer (PaST) defines the skill vector as (where is a model trained with RL, and is fine-tuned with supervision). Injection is achieved by
with in practice. PaST’s empirical analysis demonstrates SFT and RL deltas are nearly orthogonal, supporting additivity. Post-hoc composition (injecting after SFT) outperformed pre-injection and sequential fine-tuning strategies (Tang et al., 16 Jan 2026).
2.3 Latent-Space Skill Vector Construction (DyVec)
DyVec (Cai et al., 23 May 2025) advances the ICV paradigm by extracting robust latent representations from LLMs using Exhaustive Query Rotation (EQR) to minimize prompt sensitivity. DyVec segments these vectors and uses REINFORCE-driven policy learning to optimize injection sites. At inference, segments are injected at learned layer indices with tunable blending, guiding the frozen model to perform new skills as if demonstrations were present, but without explicit retraining.
2.4 Skill Vector Injection in RL and Planning
Skill-based RL injects latent skill vectors (learned via VAEs or vector quantization) as inputs to high-level policies, decoders, and residual controllers. In (Rana et al., 2022), the high-level policy samples a vector , mapped to via a flow, which is injected both into action decoders and a residual low-level policy for adaptability. VQ–CNMP (Aktas et al., 2024) uses discrete skill vectors obtained via vector quantization to enable bi-level planning, with gradient-based adaptation for fine control.
3. Security Risks: Adversarial Skill Vector Injection
In agentic LLM systems and coding assistants, skill vector injection constitutes a first-class security vulnerability:
- Definition: The skill vector —a tuple of skill definitions and bindings—is adversarially perturbed via , resulting in . The agent then maps context and prompt to actions under , potentially executing malicious or unintended behaviors (Maloyan et al., 24 Jan 2026).
- Attack Taxonomy: Attacks are classified by delivery vector (protocol-level, indirect chaining), modality (semantic, obfuscated), and propagation (persistent, viral). Notable exploit chains include malicious tool registration, indirect injection via configuration files, and cross-origin propagation via code repositories (Maloyan et al., 24 Jan 2026, Schmotz et al., 23 Feb 2026).
- Empirical Findings: Benchmarks (Skill-Inject) report up to 80% attack success rates against major LLM agents, with script-based and description-level payloads significantly increasing compromise rates (Schmotz et al., 23 Feb 2026). Even state-of-the-art defenses offer limited mitigation (<50%), and success persists under best-of-N sampling or adversarial adaptation (Maloyan et al., 24 Jan 2026, Schmotz et al., 23 Feb 2026).
4. Applications in Continual Learning, Adaptation, and Planning
- Continual Learning: Skill vector injection enables models to acquire new competencies without retraining from scratch or suffering catastrophic forgetting, as demonstrated by Fisher-weighted EWC injection for arithmetic reasoning (Sharma et al., 2022).
- Rapid Task Adaptation: DyVec achieves task adaptation by harvesting and segmenting latent vectors corresponding to new domains or modalities, supporting composable and efficient inference-time skill transfer (Cai et al., 23 May 2025).
- Modular Policy Transfer: In RL, latent skill vectors promote transferability and sample efficiency, enabling high-level agents to invoke learned primitives while allowing low-level residual adaptation for domain shifts or novel task variations (Rana et al., 2022, Aktas et al., 2024).
- Agent Capability Surface: Agentic coding stacks treat skill vectors as the operational manifold; modifications can extend, restrict, or subvert agent toolchains, directly impacting downstream affordances and user trust (Maloyan et al., 24 Jan 2026).
5. Defense and Mitigation Strategies
Robust skill vector injection—constructive or adversarial—necessitates careful controls:
- Information-Theoretic Regularization: Use EWC or similar approaches to enforce domain-specific skill boundary preservation in network parameter space (Sharma et al., 2022).
- Cryptographically Signed Skill Manifests: Employ signed, immutable skill definitions to prevent malicious skill squatting or substitution (Maloyan et al., 24 Jan 2026).
- Capability Scoping and Principle of Least Privilege: Constrain skill bindings to minimum required access, enforce path and network endpoint allow-lists, and isolate side-effecting actions (Maloyan et al., 24 Jan 2026, Schmotz et al., 23 Feb 2026).
- Context- and Policy-Aware Authorization: Move beyond input filtering; deploy runtime policy checkers that deterministically gate candidate actions based on skill provenance, context, and security policies (Schmotz et al., 23 Feb 2026).
- Multi-Agent/Broker/Guardian Pipelines: Interpose secondary agents or runtime guards to vet and verify actions induced by skill vector injection before execution (Maloyan et al., 24 Jan 2026).
- Layered Defense-in-Depth: Combine the above with sandboxed execution, context hygiene, cryptographic tracking, and fine-grained human-in-the-loop gating to minimize the risk surface.
6. Quantitative Findings, Limitations, and Outlook
- Algorithmic skill vector injection achieves substantial improvements: For example, PaST yields up to 9.9-point accuracy gains over strong SFT baselines in SQuAD QA, and +10.3% average zero-shot success on ToolBench, with linearly injected vectors (Tang et al., 16 Jan 2026). DyVec demonstrates 8.5% to 33% relative improvements over few-shot ICL and LoRA, with 3–9x speedups (Cai et al., 23 May 2025).
- Security evaluations illustrate persistent high compromise rates for agentic systems, even with contemporary defenses; script-based and YAML description attacks remain highly effective (Maloyan et al., 24 Jan 2026, Schmotz et al., 23 Feb 2026).
- Limitations: Most evaluations focus on Qwen2.5-7B or similar scales, specific task pairs, or restricted tool-use benchmarks. Injection scaling parameters, non-linear composition, and compositionality across disparate skills—especially in adversarial settings—remain open challenges (Tang et al., 16 Jan 2026, Schmotz et al., 23 Feb 2026).
- Future work focuses on adaptive injection scaling, learning non-linear transforms, combining skill vector injection with pruning or modularization, and formalizing security policy logic for robust agentic systems.
Skill vector injection, in both constructive and adversarial forms, is now fundamental in modular AI, LLM security, agent robustness, and continual skill acquisition. Its effective and secure use depends on strong information-theoretic and system-level safeguards, as well as precise architectural and procedural integration across learning paradigms.