Base-Aligned Model Collaboration (BACo)
- BACo is an inference-time framework for text generation that dynamically routes token decoding between diverse unaligned base models and high-quality aligned models to optimize output trade-offs.
- It employs both logit-based and content-based routing strategies based on next-token uncertainty and semantic roles, enabling on-the-fly switching at word boundaries without additional training.
- Empirical evaluations across open-ended tasks demonstrate that BACo achieves superior diversity-quality performance and enhanced creativity compared to single-model baselines.
Base-Aligned Model Collaboration (BACo) is an inference-time framework for text generation that enables LLMs to optimize the trade-off between output diversity and quality. BACo dynamically combines two related LLMs—an unaligned "base" model and its aligned counterpart—at the token level, selecting which model to decode from on a per-token basis. This approach leverages the high diversity of unaligned base models and the high quality of aligned models within a single generation pass, without additional training, costly decoding, or retraining. BACo employs routing strategies based on next-token prediction uncertainty and semantic role, and establishes strong results across a range of open-ended generation tasks.
1. Motivation and Principled Design
LLMs trained with human feedback ("aligned" models) exhibit high-quality outputs but a marked reduction in generation diversity, producing repetitive and similar responses across generations. Unaligned base models yield more varied outputs but often lack desired task performance, coherence, and factuality. The BACo framework addresses this diversity–quality trade-off by maintaining two next-token distribution functions:
- from the unaligned base model,
- from the aligned variant,
where is the decoding context at time step . A lightweight router makes a binary decision at each position, signaling which model's probabilities to use for sampling the next token. The combined mixture distribution at each decoding position is: where is the router's binary gating weight. Empirically, base and aligned models agree on a majority of tokens (the "superficial-alignment phenomenon"), so BACo switches models sparingly, preserving decoding efficiency.
2. Routing Strategies
BACo explores two orthogonal families of routing strategies for the router :
a) Logit-Based Routing:
These methods use statistics of the next-token probability distribution to detect uncertainty or promote diversity.
- (maximum probability).
- (entropy).
Threshold-based routers include:
- BACo-P: Route to the base model if .
- BACo-H: Route to the base model if .
b) Content-Based Routing:
These methods inspect the semantic or syntactic role of the predicted top-1 token .
- BACo-Punc: Use the aligned model if is punctuation/formatting; otherwise, use the base.
- BACo-FC: Use the aligned model if is a function word; otherwise, use the base.
Strategies may be cascaded (e.g., BACo-P-Punc applies BACo-Punc first, then BACo-P on non-punctuation) and are parameterized by a continuous threshold for fine control on the diversity–quality spectrum.
3. Inference-Time Decoding Workflow
The BACo decoding loop operates as follows:
- Initialize , set .
- While not at end-of-sequence:
- Query both base and aligned models for next-token logits to obtain and .
- Compute the routing decision using the chosen strategy.
- Form the mixture and sample .
- Append to the sequence, update .
- Increment .
Token switches are restricted to complete word boundaries to prevent sub-token incoherence. Additional optimizations, such as caching and speculative decoding, may be layered for overhead recovery.
4. Evaluation Methodology
BACo is evaluated on three open-ended generation tasks:
- Instruction Following (NoveltyBench)
- In-the-Wild Dialogue (WildChat)
- Long-Form Creative Writing (Narrative-Discourse)
For each prompt , outputs are sampled. Group-level diversity and quality are measured using an array of metrics.
Diversity Metrics:
| Category | Metric | Description |
|---|---|---|
| Lexical | Distinct- | Fraction of unique -grams |
| EAD- | Expectation-adjusted distinct- | |
| Self-BLEU, Self-ROUGE-L | Mean pairwise surface-form similarity | |
| Semantic | Embedding dissimilarity | Mean pairwise cosine distance |
| Vendi Score | Entropy of eigenvalues of an similarity kernel | |
| NLI Diversity | Mean RoBERTa NLI-based entailment/contradiction probabilities | |
| Semantic Entropy | Rao’s quadratic entropy over clusters | |
| Cluster Distinctiveness | Number of functional-equivalence clusters |
Quality Metrics:
- Perplexity under the aligned model,
- Reward scores from Skywork-Reward-Gemma.
For comparative evaluation, control parameters of each method (including baseline, prompting-based, decoding-based, ensemble, and nudging approaches) are swept to form trade-off curves . Two aggregated indicators are used:
- Coverage (Cov.): Hypervolume under the diversity–quality curve (unit-normalized).
- Dominance (Dom.): C-metric indicating the contribution to the global Pareto frontier.
5. Quantitative Results
Across all tasks and 22 evaluation spaces, BACo consistently surpasses single-model and other inference-time baselines. For the Llama-3-8B family:
| Method | Coverage (Cov.) | Dominance (Dom.) |
|---|---|---|
| Base | 0.098 | 14.3% |
| Aligned | 0.186 | 39.0% |
| Nudging | 0.261 | 9.6% |
| Prompt/Dec/Ens. best | — | <3% |
| BACo (best router) | 0.403 | 32.7% |
On semantic diversity spaces (e.g., semantic entropy vs. reward), BACo attains Cov. = 0.360 and Dom. = 40.5%. BACo-Rand, a random router, yields a 19.0% joint gain above single-model baselines; BACo-P-Punc, a hybrid heuristic router, delivers a 21.3% improvement in combined Coverage + Dominance.
In the Narrative-Discourse task, BACo achieves the highest Coverage in measures of turning-point diversity and arousal-curve diversity, indicating richer narrative structure and affective variability, without increasing perplexity.
6. Human Evaluation
A three-phase human paper on NoveltyBench and WildChat further validates BACo's automatic metrics:
- Phase I (Quality): Annotators rate each response (1–5 scale).
- Phase II (Diversity): Pairwise group comparisons for (a) overall, (b) format, and (c) content diversity.
- Phase III (Creativity): Select the most creative output among six candidates.
When automatic quality was matched, BACo’s outputs received substantially higher average quality scores (e.g., 4.04 vs. 2.83 on NoveltyBench) and were preferred for overall diversity 79.0% to 21.0% over the aligned model. For creativity, BACo outputs were chosen 79.6% of the time on NoveltyBench and 61.8% on WildChat, compared to 20.4% and 38.2% for the aligned model, respectively. These findings indicate that token-level collaboration can substantially improve both diversity and creativity without detriment to perceived quality.
7. Significance, Control, and Limitations
BACo demonstrates that dynamically combining base and aligned models at inference time can transcend existing diversity–quality trade-offs, requiring neither additional model training nor decoding passes. The single-user-controllable threshold parameter offers flexible, fine-grained adjustment of output characteristics. BACo maintains computational efficiency comparable to single-model decoding due to infrequent model switches and can be further accelerated through caching and speculative execution techniques. A plausible implication is that BACo’s approach generalizes to other model pairs or tasks with analogous diversity–quality conflicts, though this would require empirical validation. The reliance on agreement between base and aligned models may affect performance where superficial alignment does not hold, suggesting areas for further research.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free