Few-Shot/Manual CoT Techniques

Updated 18 October 2025

Few-Shot/Manual CoT is a technique that employs a limited number of exemplars with explicit reasoning traces to guide model adaptation and problem-solving.
It involves careful prompt engineering and step-by-step demonstration to improve performance in tasks spanning text, vision, and multimodal applications.
Recent advances optimize exemplar selection, spatial attention, and negative-sample integration to enhance computational efficiency and adaptability in diverse domains.

Few-Shot/Manual-CoT (Chain-of-Thought) encompasses a family of techniques in which machine learning systems are guided to solve new tasks from limited exemplars, often through explicitly structured reasoning traces provided either as carefully engineered demonstrations (manual CoT) or principled few-shot prompting. These approaches operate across diverse modalities, including text, vision, and multimodal domains, and underlie the rapid adaptation and transfer of complex capabilities in large models. Empirical and methodological advances from recent research refine how few-shot/CoT paradigms are constructed, optimized, and interpreted, revealing intricate dependencies on model architecture, pretraining, retrieval strategy, and prompt design.

1. Core Principles and Definitions

Few-shot learning leverages a limited set of labeled exemplars (often only one or a handful per class or task) to guide models in generalizing to unseen categories or tasks. Manual CoT refers to the explicit inclusion of step-by-step intermediate reasoning—handcrafted or carefully prompted—within few-shot demonstrations. These traces, whether in language or other modalities, serve to instill or elicit multi-step reasoning even when the learner has not been meta-trained on similar tasks.

Formally, in LLMs, few-shot/CoT prompting may be represented by providing the model with a prompt of the form

$\text{[Exemplar$_1$: (Input, Reasoning, Output)]} \ \text{[Exemplar$_2$: (Input, Reasoning, Output)]} \ \vdots \ \text{[Query: Input]}$

where “Reasoning” constitutes the manual CoT. In vision and multimodal tasks, the analogues involve multi-stage module activation (Wei et al., 6 Oct 2024), attention mechanisms mimicking stepwise human processing (Sokhandan et al., 2020), or spatial/temporal rationales (Ji et al., 18 Apr 2025).

2. Adaptation Strategies without Meta-Learning

Classic few-shot learning relied heavily on meta-learning, where a model is episodically trained to generalize across sampled tasks. However, recent frameworks demonstrate that adaptation can be achieved via direct fine-tuning or in-context prompt engineering, provided a sufficiently robust pre-trained feature space.

One key approach adapts a feature extractor pre-trained on a large, potentially out-of-domain dataset. The adaptation unfolds in two stages:

Base adaptation: Fine-tune only the final layers of the pre-trained network on the base classes (with few samples per class), optimizing a cosine classifier:

$f_{\theta, W}(x) = \tau [s(\phi_\theta(x), w_j)]_{j=1..c}$

where $s(\cdot,\cdot)$ is cosine similarity and $\tau$ a scaling parameter (Lifchitz et al., 2020).

Novel-class adaptation: Obtain prototypes via averaging support set embeddings and assign queries based on similarity to each prototype.

Spatial attention is introduced to address the inability to learn focus regions with few labeled examples. Entropy of pre-trained class predictions modulates pooling weights, calculated as

$w^{(q)}(x) = 1 - \frac{H(f_U^{(q)}(x))}{\log c}$

thus emphasizing high-confidence regions. This two-stage, spatially weighted fine-tuning enables data-efficient adaptation to novel classes without meta-training (Lifchitz et al., 2020).

3. Chain-of-Thought Design and Reasoning Trace Construction

Manual CoT structures reasoning in demonstrations, often following formats such as “Let’s think step by step.” Plan-and-Solve (PS/PS+) prompting, for instance, decomposes each input into plan creation and systematic execution, reducing calculation and missing-step errors (Wang et al., 2023). PS+ adds metacognitive cues to directly address common LLM failure modes in math reasoning.

Chains of thought may be programmatic, as in dynamic program prompting for math word problems, where executable Python programs replace prose rationales, making solution verification concrete and stepwise (Jie et al., 2023). These trace formats can be automatically generated (AutoReason), bypassing the need for handcrafted exemplars by prompting a strong model to decompose implicit queries, and then using these generated traces as demonstrations for less capable models (Sevinc et al., 9 Dec 2024).

For multimodal and vision tasks, sequential attention and modular decomposition provide analogues to manual CoT: object counting models process images stepwise through recurrent attention, extracting location and class for one entity at a time—a direct parallel to sequential thinking in CoT (Sokhandan et al., 2020).

4. Advances in Prompt Selection, Verbalization, and Few-Shot Optimization

Effective prompt construction is central to performance in few-shot/CoT paradigms. “Prompt Space” introduces a mathematical approach: representing all training questions as vectors, decomposing the embedding matrix (via SVD/PCA), and selecting the k most representative “basis” questions as demonstration exemplars (Shi et al., 2023). This systematic basis selection yields more robust improvements than random or manual exemplar selection.

For classification tasks—especially with mask-based LMs—verbalizers map output tokens to class labels. Innovations include label-aware automatic verbalization, where templates (e.g., “[class] and [MASK]”) induce the LM to learn semantically richer, more discriminative tokens than simply using class names (Thaminkaew et al., 2023). MaVEN further enriches manual verbalizers by expanding each seed label with its nearest neighbors in embedding space, aggregating logit scores over this enriched set for more robust few-shot performance, especially in low-resource settings (Nguyen et al., 8 Oct 2024).

Some approaches specifically integrate negative (incorrect) examples as anchors for retrieving or selecting better positive exemplars, enhancing the diversity and informativeness of few-shot prompts and addressing errors more explicitly (Liang et al., 31 Jul 2025).

5. Model and Task Dependency of Few-Shot/CoT Benefits

The effectiveness of few-shot/Manual-CoT techniques is deeply contingent on model architecture, pre-training, and the nature of the target task:

For older or weaker models, few-shot CoT and explicit demonstration of intermediate reasoning significantly improve performance, especially on math and symbolic reasoning (Wang et al., 2023, Wertheimer et al., 2022).
For modern, strong LLMs (e.g., Qwen2.5 series, GPT-4o-mini), adding CoT exemplars offers minimal performance increase over zero-shot CoT. These models often leverage the instruction to perform reasoning internally and primarily benefit from exemplars as output format guides rather than as cognitive guides (Cheng et al., 17 Jun 2025, Takayama et al., 9 Mar 2025).
In certain languages or domains, explicit CoT prompting can reduce performance, especially in advanced LLMs which may find added reasoning steps redundant or even detrimental (notable in GPT-4o-mini for Japanese and English) (Takayama et al., 9 Mar 2025).
Self-consistency (generating and marginalizing over multiple reasoning paths) can regularize outputs and improve accuracy in entity-centric sentiment analysis, but the benefit of CoT rationales is inconsistent and sometimes task/model-specific (Kuila et al., 5 Apr 2024).

The table below summarizes positive and negative effects of few-shot/CoT by model strength:

Model Type	Few-Shot/CoT Benefit	Note/Observed Trend
Weaker LM	Strong positive	Improves math/reasoning, step-by-step essential
Strong LM (Qwen2.5, G4o)	Marginal to neutral/negative	CoT prompts serve mainly for format, not reasoning
Language-specific	Task- and domain-dependent	Elementary math still benefits; others variable

6. Extensions to Multimodal, Temporal, and Low-resource Scenarios

Few-shot/CoT paradigms have been extended and tailored to non-textual domains:

Multimodal LLMs: In medical VQA, a modular collaborative CoT (MC-CoT) decomposes a question into specialized LLM-guided tasks (e.g., radiology, anatomy, pathology), each generating reasoning and extracting features from an MLLM (Wei et al., 6 Oct 2024).
Temporal action localization: Integrating hierarchical video analysis, semantic-aware text-video alignment, and CoT-like causal reasoning enables models to localize actions and anomalies with minimal labeled data. The CoT-like text captures temporal/causal dependencies not easily modeled by visual features alone, significantly improving localization in standard and anomaly detection settings (Ji et al., 18 Apr 2025).
Low-resource text classification: MaVEN and LAAV push the limits of few-shot performance by algorithmically expanding or refining manual seed sets, demonstrating cross-lingual and extreme-few-shot scalability (Nguyen et al., 8 Oct 2024, Thaminkaew et al., 2023).

7. Limitations, Controversies, and Future Directions

Persistent limitations of few-shot/Manual-CoT techniques include:

Diminishing returns in advanced LLMs, where explicit reasoning steps may be redundant or counterproductive. Several studies advocate reexamining the ICL+CoT paradigm, exploring dynamic or self-generated exemplars instead of static, handcrafted ones (Cheng et al., 17 Jun 2025).
Task- and language-specific variability, requiring customized prompt design and rigorous benchmark validation (Takayama et al., 9 Mar 2025, Kuila et al., 5 Apr 2024).
Computational considerations: While parameter-efficient fine-tuning and prompt engineering strategies exist (e.g., LoRA, prompt space decomposition), scaling to large, diverse prompt corpora and efficient retrieval remains an open challenge (Shi et al., 2023, Jie et al., 2023).

Future research is called to:

Refine automatic reasoning trace generation (AutoReason) and dynamic planning/execution strategies (Sevinc et al., 9 Dec 2024, Wang et al., 2023).
Advance retrieval-based and negative-sample-informed few-shot selection (Liang et al., 31 Jul 2025).
Bridge text, vision, and multimodal settings with unified frameworks for prompt selection and rationalized reasoning (Wei et al., 6 Oct 2024, Dogan et al., 17 Jul 2024).
Systematically analyze and define the exemplars that models pay attention to, and rethink evaluation methodologies to prioritize reasoning quality over mere output formatting (Cheng et al., 17 Jun 2025, Shi et al., 2023).

The balance between explicit demonstration, automatic trace generation, data-driven prompt engineering, and model scaling will likely define the next advances in few-shot and manual chain-of-thought reasoning systems.