SKiC Prompting: Modular Skills in LLMs

Updated 17 December 2025

SKiC Prompting is a modular in-context learning method that explicitly composes subskills to improve structured reasoning and domain-specific task performance.
It organizes skills through hierarchical structures and retrieval-augmented context fusion, enabling dynamic adaptation and efficient skill selection.
Experimental results indicate that SKiC methods significantly enhance sample efficiency and task accuracy compared to conventional in-context and chain-of-thought approaches.

Skills-in-Context (SKiC) Prompting is a class of in-context learning and prompt engineering strategies designed to elicit, compose, and orchestrate modular reasoning “skills” in LLMs. By explicitly anchoring subproblem-solving or decision fragments to demonstrated, labeled skills—and by structuring prompts to assemble these fragments contextually—SKiC methods substantially enhance LLM capabilities in compositional generalization, domain-specific tool invocation, and structured sequential reasoning. These approaches underpin recent advances in adaptive prompt recommendation, curriculum design, and systematic multi-step reasoning across both symbolic and real-world tasks.

1. Core Principles and Definitions

SKiC prompting centers on the explicit enumeration, demonstration, and contextual composition of foundational skills within the prompt. Formally, consider a set of skills $\mathcal{K} = \{k_1, ..., k_m\}$ , each defined by a description and isolated exemplars. The compositional block $\mathcal{D} = \{(x_j, r_j, y_j)\}$ presents more complex queries, with rationales $r_j$ decomposed stepwise so each invokes a member of $\mathcal{K}$ . The SKiC prompt thus integrates:

Skill Listing Block: Descriptions and 1–2-shot demonstrations per skill.
Compositional Examples: Multi-step queries, with each reasoning step explicitly attributed to a skill.
Query: A test input for which the model must produce a compositional solution, ideally leveraging the skill structure in the context.

This structure contrasts with conventional in-context learning and chain-of-thought prompting, as it not only demonstrates solutions but teaches modular algorithmic or symbolic subroutines for flexible reuse and composition (Chen et al., 2023). Algorithmic applications extend this framework to four stages: formulating algorithms as skills, accumulating multiple skills in prompt, teaching skill composition, and embedding skills as callable tools (Zhou et al., 2022).

2. Hierarchical Skill Organization and Selection

Recent SKiC infrastructure systems instantiate skills as modular fragments—each defined by metadata, precomputed vector embeddings, and usage statistics. All skill embeddings $\mathbf{e}(s_i) \in \mathbb{R}^d$ are indexed for retrieval. Plugins act as higher-level modules, each governing a subset of leaf skills, forming a rooted DAG or tree $H=(V,E)$ over plugin and skill nodes. For plugin-level selection, context embeddings guide coarse filtering:

$\mathrm{Score}_1(p;C) = \cos\left( \mathbf{e}(p),\, \mathbf{e}(C) \right)$

Top-K plugin candidates are selected; then, within each, skills are ranked via a composite relevance function:

$R(s;C) = \alpha\,\cos\left(\mathbf{e}(s),\,\mathbf{e}(C)\right) + \beta\, f_{\mathrm{tele}}(s;C) + \gamma\, g_{\mathrm{kb}}(s;C)$

where $f_{\mathrm{tele}}$ is a telemetry-based behavioral score and $g_{\mathrm{kb}}$ incorporates retrieval-augmented knowledge grounding, both dynamically updated through session data and external metadata (Tang et al., 25 Jun 2025).

3. Retrieval-Augmented Grounding and Context Fusion

A distinguishing SKiC best practice is retrieval-augmented prompt synthesis. The relevant corpus $D$ (domain documents, API schemas, session logs) is indexed for similarity search. On each query, context $C$ is encoded, top-L relevant chunks $d_1,\dots,d_L$ are retrieved, and the context embedding is enriched via fusion:

$C' = \mathrm{Fuse}\left(C,\, d_{1:L}\right) \approx [C;\;d_1;\dots;d_L]$

Grounding scores for skills are then recomputed against $C'$ , ensuring that the SKiC pipeline leverages both latent model capabilities and up-to-date external knowledge (Tang et al., 25 Jun 2025). This synergy supports domain-precision and prompt adaptability.

4. Adaptive Skill Ranking and Telemetry

SKiC architectures employ online feedback via behavioral telemetry. For each skill-context pair $(s, C)$ , user responses and reward signals $r(s;C)$ are tracked. Online ranking is updated either gradient-based:

$w_s^{\text{new}} = w_s^{\text{old}} + \eta(r(s;C) - R(s;C))\nabla_{w_s} R(s;C)$

or by Bayesian updating, maintaining Beta posteriors over skill success probabilities. These priors refine future skill selection and enhance prompt personalization. This adaptive loop underpins high-relevance and context-aware recommendations (Tang et al., 25 Jun 2025).

5. Prompt Synthesis and Few-Shot Template Filling

SKiC prompt instantiation employs both predefined template skeletons and adaptive fillers. Typical structures:

“You are a domain assistant. To accomplish user goal ‘{user_query}’, apply skill ‘{skill_name}’ by following these steps: {few_shot_examples} Now generate the exact prompt to submit to the skill.”

Few-shot examples are drawn from nearest neighbors in past sessions for the selected skill. The LLM then synthesizes constrained candidate prompts, ensuring compatibility with skill input schemas and reflecting domain-validated reasoning patterns (Tang et al., 25 Jun 2025).

6. Sample Efficiency and Compositional Alignment

Experimental evidence demonstrates that well-aligned SKiC prompting sharply improves sample efficiency and composition accuracy. By aligning simple-skill and composite examples to explicit step templates (“Step t:”), the model disambiguates which reasoning block relates to which skill in the composition. Results across symbolic manipulation, arithmetic, multi-hop QA, and decision tasks show:

Model	Vanilla ICL	Naïve CoT	SKiC/ExpCoT Step-Aligned
Llama-7B	32.6%	42.2%	47.5%
Llama2-70B	80.8%	77.6%	87.2%
Mixtral-8x7B	71.2%	77.6%	87.5%

Accuracy rises or stabilizes with SKiC/ExpCoT even as more simple examples are added—whereas accuracy drops sharply in vanilla and naïve formulations due to misalignment and step confusion (Liu et al., 27 Oct 2025).

7. Limitations and Theoretical Boundaries

The theoretical analysis of prompt-based and prefix-tuning methods reveals that pure prompting cannot alter relative content–content attention rankings within the frozen LLM—SKiC can only steer the model among pretrained latent skills (“attention subspaces”) but cannot induce fundamentally novel reasoning strategies. To overcome these expressiveness boundaries, trainable adapters or LoRA-style modifications can be injected into transformer layers, enabling fresh nonlinear transformations and genuine changes in attention patterns (Petrov et al., 2023).

A key practical limitation is the need for human oversight or strong auxiliary models to enumerate skills, select step templates, and triage compositional alignments. For small models, indiscriminate skill-based prompting can introduce cognitive overload, reducing accuracy on easy problems; adaptive SKiC strategies (AdaptMI, AdaptMI+) circumvent this by tailoring skill-based examples only when and where the model is demonstrably weak (He et al., 30 Apr 2025).

8. Extensions and Future Directions

SKiC prompting pipelines are now extensible in several dimensions:

DAG hierarchies: skills shared among multiple plugins
Multi-hop retrieval for dynamic sub-query derivation
Diversity penalties in skill selection for orthogonal candidate surfacing
Curriculum learning in model training via adaptive SKiC example selection
Cross-model transfer, automated skill mining, and meta-learning for template refinement

Ongoing research targets the generalization of SKiC to code synthesis, multi-modal reasoning, and greater integration with self-consistency and progressive-hint methodologies (Chen et al., 2023, Tang et al., 25 Jun 2025).

In summary, Skills-in-Context Prompting operationalizes prompt engineering as the dynamic assembly and orchestration of modular skills, guided by contextual relevance, telemetry feedback, and retrieval-grounded knowledge. This methodology enables both expressive compositional reasoning and robust domain adaptation, while its limitations inform the need for deeper model adaptations to unlock further generalization and reasoning power.