Papers
Topics
Authors
Recent
Search
2000 character limit reached

AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference

Published 5 Apr 2026 in cs.CL and cs.AI | (2604.03925v1)

Abstract: LLMs struggle to accumulate evidence across multiple rounds of user interaction, failing to update their beliefs in a manner consistent with Bayesian inference. Existing solutions require fine-tuning on sensitive user interaction data, limiting their applicability in privacy-conscious settings. We propose AdaptFuse, a training-free framework that externalizes probabilistic computation entirely from the LLM: a symbolic module maintains a Bayesian posterior over a discrete hypothesis set, while a frozen LLM contributes semantic reasoning via multi-sample Dirichlet aggregation. The two signals are combined through entropy-adaptive fusion, which automatically weights each source by its predictive confidence, shifting reliance from the LLM to the symbolic posterior as evidence accumulates. We evaluate across three domains: flight recommendation, hotel recommendation, and web shopping; on Gemma 2 9B, Llama 3 8B, and Qwen 2.5 7B. AdaptFuse consistently outperforms both prompting baselines and fine-tuned Bayesian Teaching models on all tasks, with accuracy improving monotonically over interaction rounds. These results demonstrate that principled inference-time algorithms can substitute for fine-tuning in personalized recommendation, without storing or training on sensitive user data. All the code and materials will be open-sourced.

Summary

  • The paper introduces AdaptFuse, a training-free method that leverages externalized Bayesian inference combined with frozen LLM semantic reasoning to learn sequential preferences.
  • It employs an entropy-adaptive fusion mechanism to dynamically weight the symbolic and LLM modules, resulting in monotonically increasing accuracy over interaction rounds.
  • Empirical results in flight, hotel, and web shopping recommendations demonstrate robust performance, privacy preservation, and domain transferability compared to fine-tuned baselines.

Training-Free Sequential Preference Learning with AdaptFuse

Introduction and Motivation

The paper "AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference" (2604.03925) addresses the challenge of personalizing AI agents for user-centric tasks—such as recommendation and dialog—while eliminating the need to fine-tune LLMs on sensitive user interaction data. Existing works, particularly Bayesian Teaching, demonstrate that fine-tuned LLMs can approximate optimal Bayesian belief updating in sequential preference inference, but require direct access to potentially privacy-sensitive logs and costly model retraining for deployment across domains. AdaptFuse proposes a fully training-free alternative: it externalizes all probabilistic belief computation to a symbolic module, while leveraging frozen LLMs for semantic reasoning over item descriptions, combined through a theoretically principled, entropy-adaptive fusion mechanism.

Methodological Framework

Problem Formalization

The setting is sequential preference learning: over TT rounds, the agent presents KK candidate items in natural language, elicits the user’s choice, and aims to infer the user’s static latent utility function to generalize preferences to held-out items. Feature extraction is deterministic; items are converted from text to normalized dd-dimensional vectors. Preference is modeled using a linear utility function, with Bayesian inference performed not over the full continuous space, but over a discretized hypothesis set H\mathcal{H} capturing all plausible preference vectors observed in the training data.

Symbolic Bayesian Module

The symbolic module initializes a uniform categorical prior over H\mathcal{H}, and, after each round, exactly updates the posterior via Bayes’ rule using the Luce choice model as the user likelihood. The resulting belief vector is used to marginalize the predictive distribution over the presented options. This module is zero-parameter and requires no learning or gradient computation.

LLM-Based Semantic Module

A frozen LLM is sampled NN times per round with prompt variants and randomized decoding temperature to increase sample diversity. Each sample outputs a predicted item and a parsed confidence score. Aggregation is performed using a Dirichlet-Multinomial model: for each option, pseudo-counts are updated with confidence-weighted increments, yielding a Dirichlet posterior mean as the LLM’s predictive distribution. Exponential moving average smoothing further reduces variance across rounds.

Entropy-Adaptive Fusion

Critical to AdaptFuse is the automatic adaptation of the fusion weights between the symbolic and LLM modules. Each module’s predictive entropy is computed; higher predictive confidence yields more weight in the final fused distribution. Early in interaction, with a diffuse symbolic posterior, the fusion mechanism allocates more weight to the LLM, but as evidence accumulates and symbolic certainty grows, LLM influence is reduced accordingly. This dynamic shares information efficiently and monotonically increases reliance on the most confident module as evidence accumulates.

Empirical Evaluation

Experimental Setup and Baselines

Experiments are conducted on three sequential recommendation domains: flights, hotels, and web shopping, using three open-source LLMs (Gemma 2 9B, Llama 3 8B, Qwen 2.5 7B) without weight updates. Comparisons are made to five baselines: standard prompting, Chain-of-Thought (CoT), self-consistency, Oracle Learning (fine-tuned on ground-truth user choices), and Bayesian Teaching (fine-tuned on Bayesian agent demonstrations).

Main Results: Flight Recommendation

AdaptFuse robustly outperforms all baselines across all model architectures. On the flight recommendation task, it attains 76.2% final-round accuracy (Gemma), compared with 73.4% for Bayesian Teaching and significantly lower performance for all prompting and self-consistency strategies. Figure 1

Figure 1: Accuracy over interaction rounds on the flight task, demonstrating AdaptFuse’s monotonic improvements, in contrast with the baselines’ plateauing.

A key result is the monotonically increasing accuracy of AdaptFuse as the number of interaction rounds grows—a property absent from prompting-only baselines, which plateau after initial rounds. Even relative to fine-tuned Bayesian Teaching, AdaptFuse continues to improve in late rounds and does not require tuning or access to sensitive training data.

Task Complexity and Generalization

Performance degrades gracefully as the number of item attributes (i.e., complexity of the hypothesis space) increases, with AdaptFuse consistently matching or outperforming fine-tuned alternatives across all complexity levels. Figure 2

Figure 2: Final-round accuracy as item attribute dimensionality dd increases, illustrating robust AdaptFuse performance even in higher-dimensional spaces.

AdaptFuse maintains its advantage when tested on out-of-domain generalization—hotel recommendation and web shopping—demonstrating the fusion framework’s domain-agnosticity and its lack of reliance on domain-specific hyperparameters or user logs. Figure 3

Figure 3: Final-round generalization accuracy for (a) hotel recommendation and (b) web shopping, with AdaptFuse consistently achieving the best results.

Ablation Studies

Critical components are ablated to reveal their individual contributions. Switching from entropy-based to fixed-fusion weighting results in a decrease of 3.5 points in final accuracy, and removing confidence-weighted Dirichlet aggregation or temporal smoothing similarly degrades performance, validating the theoretical rationale for each design choice.

Inference Latency

Runtime analysis reveals that AdaptFuse’s minimal overhead is dominated by parallel LLM queries; symbolic Bayesian computation is negligible. Unlike fine-tuned approaches, no training costs are incurred, and privacy-preserving, on-premise deployment is straightforward.

Detailed Preference Tracking Dynamics

Held-out accuracy after each interaction round further attests to the Bayesian posterior concentration properties of AdaptFuse, with the LLM’s influence adaptively reduced as the symbolic model’s predictive confidence surpasses it. Figure 4

Figure 4: Interactive flight recommendation accuracy per round for different methods, including Bayesian-trained and oracle-finetuned assistants.

Theoretical and Practical Implications

AdaptFuse’s results provide strong evidence for the sufficiency of training-free, externalized Bayesian inference in sequential preference learning tasks. The approach separates semantic (LLM) and probabilistic (symbolic) roles, sidestepping the core bottleneck of in-context LLMs failing to accumulate evidence across rounds. This not only ensures superior empirical performance but also lends itself to privacy preservation (since no user data is retained or transmitted), flexible on-premise deployment, and domain transferability without retraining.

Theoretically, the framework demonstrates that LLM-driven agents benefit from explicit externalized Bayesian reasoning rather than having such reasoning implicitly encoded through gradient-based model tuning or in-context demonstration. The methodology’s modular, analyzable inference process may also serve as a template for future agent architecture: separating probabilistic reasoning from semantic world modeling.

Conclusion

AdaptFuse establishes a new performance upper bound for training-free sequential preference learning, outperforming both direct-prompting and fine-tuned Bayesian LLM baselines. By unifying Bayesian posterior updates with LLM-driven semantic aggregation and dynamically fusing the resulting signals via entropy-based weighting, AdaptFuse achieves sample-efficient, privacy-preserving, and deployable personalization. The results suggest that future AI systems for preference learning and recommendation will likely adopt explicit, externalized inference mechanisms, relegating LLMs to semantic modules while relying on external, mathematically guaranteed probabilistic computation for personalization. This opens new research avenues in hybrid symbolic-neural systems for aligned and trustworthy recommendation.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.