Adaptive Verbalizers: Dynamic Class Mapping

Updated 17 December 2025

Adaptive verbalizers are dynamic mapping strategies that convert language model outputs into user-defined class labels using data-driven techniques.
They employ methods such as prototype-based embeddings, soft label-embedding, evolutionary search, and manifold learning to enhance accuracy and scalability.
By automatically aligning model representations with task requirements, adaptive verbalizers overcome manual design limitations in few-shot, zero-shot, and multilingual scenarios.

Adaptive verbalizers are mechanisms for dynamically learning or selecting the mapping between LLM outputs—typically masked token predictions or latent embeddings—and user-specified class labels, particularly in the context of prompt-based learning, few-shot learning, and task adaptation. Unlike static, hand-crafted verbalizers that rely on a fixed set of label words, adaptive verbalizers leverage data-driven, model-internal, or external knowledge sources to automatically construct, optimize, or refine the label-word or embedding-to-label mapping, resulting in improved accuracy and robustness, especially under data scarcity, domain shifts, or multilingual scenarios. Adaptive verbalizers include prototype-based mappings, candidate set expansion with model-inferred or knowledge-based tokens, meta-learned or instance-specific continuous embeddings, evolutionary search, and manifold-based re-embeddings. These techniques collectively address the limitations of manual verbalization—coverage, flexibility, bias, and scalability—by instantiating mappings that continuously and contextually align model semantics with target task requirements.

1. Concept and Motivation

Traditional prompt-based learning for pre-trained LLMs (PLMs) converts classification problems into cloze-style tasks, where the model predicts a token to fill a masked position, and a verbalizer maps this token (or set of tokens) to task labels. Manual verbalizers select discrete words or synonyms (e.g., "positive" → "great," "negative" → "bad") for each class, but this approach requires extensive domain knowledge and is brittle across domains, languages, or label sets. Key motivations driving adaptive verbalizer development include:

Scaling to settings with numerous or previously unseen classes (e.g., large-scale entity typing, multilingual domains) where manual selection is impractical.
Circumventing problems of limited supervision, where data scarcity undermines the informativeness of handcrafted label-word sets.
Mitigating the bias and instability introduced by arbitrary or suboptimal label-word assignments.
Providing robustness to label noise and semantic drift by dynamically aligning class prototypes with the latent geometry of LLM representations (Cui et al., 2022, Wang et al., 2023).

2. Methodological Approaches

Adaptive verbalizers are realized through several core strategies that move beyond fixed token lists:

2.1. Prototypical and Continuous Verbalizers

Prototypical verbalizers replace discrete label-word sets with class-specific prototype embeddings, learned by clustering model outputs through contrastive or metric learning objectives. At inference, prediction is performed by measuring embedding similarity (typically cosine) between a test example's masked-token representation and class prototypes:

$P(y = c | x) = \frac{\exp(S(\mathbf{v}, \mathbf{c}_c) / \tau)}{\sum_{c'} \exp(S(\mathbf{v}, \mathbf{c}_{c'}) / \tau)}$

where $S(\cdot, \cdot)$ denotes cosine similarity, $\mathbf{v}$ is the instance embedding, and $\mathbf{c}_c$ is the prototype for class $c$ (Cui et al., 2022, Wei et al., 2022, Zhou et al., 2023).

Learning typically involves losses that cluster same-class instances and prototypes (instance–prototype and instance–instance contrastive terms). This formulation removes dependence on token-specific mappings and adapts naturally to new classes and domains (Cui et al., 2022).

2.2. Soft and Label-Embedding Verbalizers

Soft verbalizers (termed "label-embedding verbalizers") represent each class by a trainable embedding or tensor, typically disconnected from the model's native token vocabulary. These label embeddings are optimized during few-shot or prompt-based fine-tuning. The model scores each class by computing a dot product between the learned label embedding and model outputs at the mask position, aggregated if multi-token (Mahabadi et al., 2022):

$s(x)_y = \frac{1}{M}\sum_{i=1}^M L_i(y)^\top h_{[MASK]_i}(x)$

This design is more parameter-efficient, does not require autoregressive token generation, and outperforms both manual and previous automatic verbalizer schemes in sample efficiency (Mahabadi et al., 2022).

2.3. Evolutionary and Search-Based Verbalizers

Evolutionary verbalizer search (EVS) and label-aware automatic verbalizers perform candidate discovery and optimization over large token sets using population-based search or data-driven scoring. EVS evolves pools of candidate tokens for each label using crossover, mutation, and selection, optimizing an explicit fitness function (validation accuracy) (Ling et al., 2023), while LAAV uses concatenated prompt templates ("label and [MASK]") to induce the model to surface optimal verbalizer words by aggregate likelihood. The selected verbalizer is then used in standard masked LLM scoring at inference (Thaminkaew et al., 2023).

2.4. Knowledge-Based and Scenario-Specific Verbalizers

Verbalizers can be expanded using external knowledge sources, such as Probase (Ma et al., 2024), or by harvesting task-specific candidate words directly from model predictions over test or unlabeled examples. AdaPrompt, for instance, leverages PLM completions filtered by entailment checks with an external NLI model to expand the set of label words beyond a seed set, resulting in more robust and contextually aligned class token sets (Chen et al., 2022). ISCV incorporates scenario-specific concept mining and cascade calibration for robust, abstract, and discriminative label-word selection (Ma et al., 2024).

2.5. Manifold and Metric Learning-Based Verbalizers

The manifold geometry of model output embeddings may not be adequately captured by Euclidean or cosine distance. Methods such as Locally Linear Embedding with Intra-class Neighborhood Constraint (LLE-INC) reconstruct a new embedding space that preserves local class-specific manifold structure, performing kNN classification in this re-embedded space. This approach is tuning-free and effective even for hyper-scale models (Wang et al., 2023).

3. Applications and Empirical Results

Adaptive verbalizers yield measurable improvements in a wide range of tasks and experimental regimes, notably:

Few-shot text classification: Prototypical and meta-learned verbalizers consistently outperform manual and search-based verbalizers under 1–16 shot regimes on standard datasets (AG’s News, DBPedia, Yahoo, IMDB, etc.), especially when data is scarce or class counts are high (Cui et al., 2022, Jiang et al., 2023).
Zero-shot and robust adaptation: Adaptive verbalizers demonstrate greater stability to changed templates, noise, and new class labels, and can support zero-shot predictions by leveraging knowledge elicitation or external conceptual graphs (Wei et al., 2022, Ma et al., 2024).
Multilingual and cross-domain transfer: Approaches like LAAV and scenario-specific verbalizers adapt the verbalization process to different target languages and domains simply by recomputing candidate sets or prototypes from the respective model vocabulary and data (Thaminkaew et al., 2023, Chen et al., 2022).
Dialog and speech systems: In spoken LLM settings, adaptive verbalization (as in Think-Verbalize-Speak and the ReVerT framework) decouples complex reasoning from speech-friendly output, mediating through an intermediate verbalization step for naturalness and latency reduction without sacrificing reasoning (Woo et al., 19 Sep 2025).

4. Key Properties, Design Trade-offs, and Theoretical Implications

Adaptive verbalizers present several distinct advantages over manual or static counterparts:

Flexibility: They generalize to unseen classes, languages, or label distributions.
Data efficiency and robustness: Prototype and contrastive learning objectives enable learning from limited annotated data and show reduced susceptibility to noise (Cui et al., 2022).
Automated alignment: Model-driven candidate mining ensures verbalizers encode semantics captured by the target PLM, not just static human priors (Chen et al., 2022, Ma et al., 2024).
Computational efficiency: Soft verbalizers and tuning-free techniques minimize parameter updates and memory footprint, supporting efficient adaptation even at scale (Mahabadi et al., 2022, Wang et al., 2023).

However, adaptive verbalizers also introduce challenges. Prototype learning can fail if class boundaries are poorly separated in the embedding space or if few candiate instances are available. Large candidate pools (knowledge-augmented or search-based) risk noisy class assignments without proper calibration or filtering (Ma et al., 2024). Furthermore, multi-label settings and the adaptation of verbalization to sequence-generation or compositional outputs remain active research topics (Thaminkaew et al., 2023).

5. Algorithmic Recipes and Implementation Patterns

Most adaptive verbalizer frameworks follow a two-stage pipeline:

Discovery/Initialization: Candidates are discovered via model-internal probabilities, external knowledge graphs, or clustering of support set examples in the model's output space.
Adaptation/Optimization: The mapping from candidate words or embeddings to class labels is optimized via contrastive, margin-based, or language-model-aligned objectives, often in conjunction with prompt learning and partial model fine-tuning.

Representative patterns include:

Method	Verbalizer Representation	Adaptation Mechanism	Reference
ProtoVerb	Continuous prototypes	Contrastive loss/clustering	(Cui et al., 2022)
RepVerb (MetaPrompter)	Instance mean embeddings	Meta-learning (MAML)	(Jiang et al., 2023)
AdaPrompt	Expanded candidate sets	PLM + NLI entailment	(Chen et al., 2022)
LAAV, EVS	Discrete token sets, evolved	Likelihood/candidate search	(Thaminkaew et al., 2023)/(Ling et al., 2023)
LLE-INC	Re-embedded kNN neighborhoods	Manifold learning, no tuning	(Wang et al., 2023)
ISCV	Scenario-mined concept words	Cascade calibration	(Ma et al., 2024)

These methods are compatible with both fixed (frozen) and fine-tuned PLMs, handle both single- and multi-token verbalizations, and often synergize with advances in prompt engineering, meta-learning, and continual adaptation.

6. Research Frontiers and Directions

Open research areas in adaptive verbalizer design include:

Joint prompt and verbalizer optimization: Co-adapting templates and verbalizers, possibly in continuous or structured spaces, to maximally align with downstream task semantics.
Extension to sequence or multi-label outputs: Generalizing verbalization beyond categorical classification to multi-label or structured output regimes.
Efficient adaptation for extremely large models: Leveraging manifold learning, tuning-free adaptation, and retrieval-based methods to avoid costly parameter updates in billions-scale LLMs (Wang et al., 2023).
Meta-learning and task transferability: Integrating task-conditioned prompt pools and plug-and-play verbalizers with efficient task transfer and generalization (Jiang et al., 2023).
Hybrid discrete–continuous and knowledge-enriched designs: Combining the strengths of knowledge graphs, contextual calibration, and prototypical embedding spaces for improved coverage and semantic fidelity (Ma et al., 2024, Chen et al., 2022).

The field continues to expand rapidly, both in methodology—incorporating meta-learning, self-supervision, and manifold methods—and in application, including dialog systems, low-resource language adaptation, semi-supervised, and zero-shot learning (Woo et al., 19 Sep 2025).

Adaptive verbalizers represent a shift from static, human-designed class mappings to dynamic, data- and model-driven mapping strategies that fully exploit the latent capabilities of contemporary LLMs. By bridging the gap between internal representations and task labels via learned or optimized structures, they increase flexibility, accuracy, and robustness across a variety of resource-constrained and complex NLP settings.