Base-Aligned Model Collaboration (BACo)

Updated 14 November 2025

BACo is an inference-time framework for text generation that dynamically routes token decoding between diverse unaligned base models and high-quality aligned models to optimize output trade-offs.
It employs both logit-based and content-based routing strategies based on next-token uncertainty and semantic roles, enabling on-the-fly switching at word boundaries without additional training.
Empirical evaluations across open-ended tasks demonstrate that BACo achieves superior diversity-quality performance and enhanced creativity compared to single-model baselines.

Base-Aligned Model Collaboration (BACo) is an inference-time framework for text generation that enables LLMs to optimize the trade-off between output diversity and quality. BACo dynamically combines two related LLMs—an unaligned "base" model and its aligned counterpart—at the token level, selecting which model to decode from on a per-token basis. This approach leverages the high diversity of unaligned base models and the high quality of aligned models within a single generation pass, without additional training, costly decoding, or retraining. BACo employs routing strategies based on next-token prediction uncertainty and semantic role, and establishes strong results across a range of open-ended generation tasks.

1. Motivation and Principled Design

LLMs trained with human feedback ("aligned" models) exhibit high-quality outputs but a marked reduction in generation diversity, producing repetitive and similar responses across generations. Unaligned base models yield more varied outputs but often lack desired task performance, coherence, and factuality. The BACo framework addresses this diversity–quality trade-off by maintaining two next-token distribution functions:

$P_{\mathrm{base}}(y_t \mid c_t)$ from the unaligned base model,
$P_{\mathrm{aligned}}(y_t \mid c_t)$ from the aligned variant,

where $c_t = [x,\,y_{<t}]$ is the decoding context at time step $t$ . A lightweight router $\mathcal{R}$ makes a binary decision at each position, signaling which model's probabilities to use for sampling the next token. The combined mixture distribution at each decoding position is: $P_{\mathrm{BACo}}\bigl(y_t\mid c_t\bigr)=w_{\mathrm{base}}(c_t)\,P_{\mathrm{base}}\bigl(y_t\mid c_t\bigr) + \bigl(1 - w_{\mathrm{base}}(c_t)\bigr)\,P_{\mathrm{aligned}}\bigl(y_t\mid c_t\bigr),$ where $w_{\mathrm{base}}(c_t)\in\{0,1\}$ is the router's binary gating weight. Empirically, base and aligned models agree on a majority of tokens (the "superficial-alignment phenomenon"), so BACo switches models sparingly, preserving decoding efficiency.

2. Routing Strategies

BACo explores two orthogonal families of routing strategies for the router $\mathcal{R}$ :

a) Logit-Based Routing:

These methods use statistics of the next-token probability distribution to detect uncertainty or promote diversity.

$p^{\mathrm{base}}_{\max}(c_t) = \max_{y_t}P_{\mathrm{base}}(y_t\mid c_t)$ (maximum probability).
$H_{\mathrm{base}}(c_t) = -\sum_{y_t} P_{\mathrm{base}}(y_t|c_t)\log P_{\mathrm{base}}(y_t|c_t)$ (entropy).

Threshold-based routers include:

BACo-P: Route to the base model if $p^{\mathrm{base}}_{\max}(c_t) < \gamma$ .
BACo-H: Route to the base model if $H_{\mathrm{base}}(c_t) > \gamma$ .

b) Content-Based Routing:

These methods inspect the semantic or syntactic role of the predicted top-1 token $\hat{y}_{\mathrm{aligned}} = \arg\max_y P_{\mathrm{aligned}}(y|c_t)$ .

BACo-Punc: Use the aligned model if $\hat{y}_{\mathrm{aligned}}$ is punctuation/formatting; otherwise, use the base.
BACo-FC: Use the aligned model if $\hat{y}_{\mathrm{aligned}}$ is a function word; otherwise, use the base.

Strategies may be cascaded (e.g., BACo-P-Punc applies BACo-Punc first, then BACo-P on non-punctuation) and are parameterized by a continuous threshold $\gamma$ for fine control on the diversity–quality spectrum.

3. Inference-Time Decoding Workflow

The BACo decoding loop operates as follows:

Initialize $c_0 = [\text{prompt } x]$ , set $t = 1$ .
While not at end-of-sequence:
- Query both base and aligned models for next-token logits to obtain $P_{\mathrm{base}}(y|c_t)$ and $P_{\mathrm{aligned}}(y|c_t)$ .
- Compute the routing decision $w_{\mathrm{base}}(c_t)$ using the chosen strategy.
- Form the mixture $P_{\mathrm{BACo}}(y_t|c_t)$ and sample $y_t$ .
- Append $y_t$ to the sequence, update $c_{t+1} = c_t \oplus y_t$ .
- Increment $t$ .

Token switches are restricted to complete word boundaries to prevent sub-token incoherence. Additional optimizations, such as caching and speculative decoding, may be layered for overhead recovery.

4. Evaluation Methodology

BACo is evaluated on three open-ended generation tasks:

Instruction Following (NoveltyBench)
In-the-Wild Dialogue (WildChat)
Long-Form Creative Writing (Narrative-Discourse)

For each prompt $x$ , $k=10$ outputs $\mathcal{Y}(x) = \{y^{(1)}, \dots, y^{(k)}\}$ are sampled. Group-level diversity $D(\mathcal{Y})$ and quality $Q(\mathcal{Y}) = \frac{1}{k} \sum_i Q(y^{(i)}|x)$ are measured using an array of metrics.

Diversity Metrics:

Category	Metric	Description
Lexical	Distinct- $n$	Fraction of unique $n$ -grams
	EAD- $n$	Expectation-adjusted distinct- $n$
	Self-BLEU, Self-ROUGE-L	Mean pairwise surface-form similarity
Semantic	Embedding dissimilarity	Mean pairwise cosine distance
	Vendi Score	Entropy of eigenvalues of an $n \times n$ similarity kernel
	NLI Diversity	Mean RoBERTa NLI-based entailment/contradiction probabilities
	Semantic Entropy	Rao’s quadratic entropy over clusters
	Cluster Distinctiveness	Number of functional-equivalence clusters

Quality Metrics:

Perplexity under the aligned model,
Reward scores from Skywork-Reward-Gemma.

For comparative evaluation, control parameters of each method (including baseline, prompting-based, decoding-based, ensemble, and nudging approaches) are swept to form trade-off curves $\{(D_\tau, Q_\tau)\}$ . Two aggregated indicators are used:

Coverage (Cov.): Hypervolume under the diversity–quality curve (unit-normalized).
Dominance (Dom.): C-metric indicating the contribution to the global Pareto frontier.

5. Quantitative Results

Across all tasks and 22 evaluation spaces, BACo consistently surpasses single-model and other inference-time baselines. For the Llama-3-8B family:

Method	Coverage (Cov.)	Dominance (Dom.)
Base	0.098	14.3%
Aligned	0.186	39.0%
Nudging	0.261	9.6%
Prompt/Dec/Ens. best	—	<3%
BACo (best router)	0.403	32.7%

On semantic diversity spaces (e.g., semantic entropy vs. reward), BACo attains Cov. = 0.360 and Dom. = 40.5%. BACo-Rand, a random router, yields a 19.0% joint gain above single-model baselines; BACo-P-Punc, a hybrid heuristic router, delivers a 21.3% improvement in combined Coverage + Dominance.

In the Narrative-Discourse task, BACo achieves the highest Coverage in measures of turning-point diversity and arousal-curve diversity, indicating richer narrative structure and affective variability, without increasing perplexity.

6. Human Evaluation

A three-phase human paper on NoveltyBench and WildChat further validates BACo's automatic metrics:

Phase I (Quality): Annotators rate each response (1–5 scale).
Phase II (Diversity): Pairwise group comparisons for (a) overall, (b) format, and (c) content diversity.
Phase III (Creativity): Select the most creative output among six candidates.

When automatic quality was matched, BACo’s outputs received substantially higher average quality scores (e.g., 4.04 vs. 2.83 on NoveltyBench) and were preferred for overall diversity 79.0% to 21.0% over the aligned model. For creativity, BACo outputs were chosen 79.6% of the time on NoveltyBench and 61.8% on WildChat, compared to 20.4% and 38.2% for the aligned model, respectively. These findings indicate that token-level collaboration can substantially improve both diversity and creativity without detriment to perceived quality.

7. Significance, Control, and Limitations

BACo demonstrates that dynamically combining base and aligned models at inference time can transcend existing diversity–quality trade-offs, requiring neither additional model training nor decoding passes. The single-user-controllable threshold parameter $\gamma$ offers flexible, fine-grained adjustment of output characteristics. BACo maintains computational efficiency comparable to single-model decoding due to infrequent model switches and can be further accelerated through caching and speculative execution techniques. A plausible implication is that BACo’s approach generalizes to other model pairs or tasks with analogous diversity–quality conflicts, though this would require empirical validation. The reliance on agreement between base and aligned models may affect performance where superficial alignment does not hold, suggesting areas for further research.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Base-Aligned Model Collaboration (BACo).