LLM-Informed Priors

Updated 6 May 2026

LLM-Informed Priors are structured inductive biases derived from LLM outputs that embed domain expertise and semantic structure into machine learning models.
They are extracted using prompting and natural-language querying, then mapped into regularizers, Bayesian priors, or graph structures to enhance model performance.
Empirical studies show these priors improve sample efficiency, adaptability, and decision quality in tasks ranging from multi-agent RL to Bayesian predictive modeling.

LLM-Informed Priors are structured inductive biases or prior distributions derived from the outputs or knowledge of LLMs and integrated into machine learning systems. They encode domain expertise, semantic structure, or coordination strategies inferred by LLMs—often via natural-language querying and prompt engineering—enabling models to incorporate the rich, context-sensitive knowledge stored in LLMs into statistical, reinforcement learning, or ensemble frameworks. LLM-informed priors have demonstrated benefits in sample efficiency, adaptability, robustness, interpretability, and downstream decision quality across domains including tabular learning, multi-agent coordination, Bayesian statistics, psychometrics, and interpretability modeling.

1. Fundamentals of LLM-Informed Priors

LLM-informed priors are mechanisms for extracting, structuring, and injecting the probabilistic or semantic knowledge of an LLM into a downstream learner as an explicit prior. This process typically involves the following steps:

Prompting and output extraction: Human- or program-generated prompts, often based on metadata, observations, or candidate features, elicit from the LLM numeric scores, distributions, rankings, graph structures, or concept definitions.
Mapping to model space: The LLM’s outputs are mapped to priors on model parameters, graph adjacency matrices, penalty weights, or other inductive biases relevant for the downstream task.
Integration into learning objective: The downstream model’s loss is augmented or shaped by the LLM-informed prior, such as via regularization (cross-entropy, KL, or $\ell_2$ penalties), prior distributions in Bayesian inference, or fixed graph structures for GNNs.

Early formalizations define the objective as:

$\theta^* = \arg\min_\theta \left[ \mathcal{L}_\text{task}(\theta; D) + \lambda\, \Omega_\text{LLM-prior}(\theta; M) \right],$

where $M$ denotes prompt metadata, and $\Omega_\text{LLM-prior}$ penalizes divergence from the LLM’s judgments (Choi et al., 2022).

This approach is distinct from naively transferring LLM predictions (e.g., in-context learning), by emphasizing structured, interpretable priors and their regularizing or guiding effect on the model’s inductive biases.

2. Methods for Extracting and Structuring LLM Priors

Several methodologies for constructing LLM-informed priors have emerged:

Parameter Elicitation for Bayesian Models: Frameworks such as AutoElicit extract Gaussian prior means and variances for linear model weights by prompting the LLM with multiple paraphrased queries. The resulting prior is a Dirichlet-mixed Gaussian over the weight vector, used in Bayesian inference engines (Capstick et al., 2024).
Task-Specific Prior Structuring: In "LMPriors," variable- or feature-level metadata are mapped to few-shot prompts. The LLM’s predicted likelihood of binary events (e.g., feature relevance, action safety) define probability vectors that regularize downstream predictions through cross-entropy, acting as task-specific priors (Choi et al., 2022).
Graph Priors for Coordination: In multi-agent systems, natural-language summaries of agent states are composed into prompts. The LLM generates an $n \times n$ affinity matrix representing a weighted, undirected coordination graph, symmetrized and processed for use as a fixed aggregation prior in a GNN-based MARL pipeline (Gupta et al., 19 Apr 2026).
Feature Weighting for Supervised Learning: LLM-assigned semantic relevance scores are injected into ensemble learners via penalty reweighting, feature scaling, or instance weighting. Adaptive calibration is achieved via cross-validated stacking, as in the Statsformer framework (Zhang et al., 29 Jan 2026).
Concept Discovery and Selection: In Bayesian Concept Bottleneck Models (BC-LLM), candidate concept definitions are proposed by the LLM and weighted to serve as a prior over a (potentially infinite) concept set; stochastic sampling ensures coverage and posterior consistency (Feng et al., 2024).

A summary of major classes of methods is shown below:

Method	LLM Output	Prior Type
AutoElicit	Gaussian $(\mu,\sigma)$ per feature	Mixture of Gaussians
LMPriors	Next-token/class probabilities	Cross-entropy penalty
Coordination Graphs	Affinity matrix (JSON, $[0,1]^{n\times n}$ )	Fixed graph prior
Statsformer	Feature relevance scores	Penalty/feature/instance weighting
BC-LLM	Concept definitions/scores	Prior over concept space

3. Application Domains and Empirical Impact

LLM-informed priors have been applied across a wide range of domains, yielding improvements primarily in low-data, safety-critical, or semantically structured tasks:

Tabular and Low-Resource Supervised Learning: Priors on feature relevance, categorical variable orderings, and correlation signs improve sample efficiency and interpretability in logistic regression and ensemble models (Zhu et al., 2023, Zhang et al., 29 Jan 2026). For example, BiasedLR yields relative AUC lifts of up to 30% at 4-shot regime compared to data-only baselines (Zhu et al., 2023).
Bayesian Predictive Modeling: In small- to medium-sample settings, LLM-informed Gaussian or exponential priors for weights or hyperparameters substantially reduce the sample complexity of Bayesian inference. In UTI prediction, a 55% reduction in labeling effort was reported relative to flat priors (Capstick et al., 2024). In clinical trial AE modeling, LLM priors yielded a 20% reduction in required patient samples for matched predictive accuracy (Arai et al., 4 Sep 2025).
Multi-Agent Reinforcement Learning: LLM-derived coordination graph priors enable GNN-based MARL algorithms to outperform both hand-crafted graph and fully learned structure methods, especially in asymmetric or adversarial tasks (+24 points in Adversary), with compact 1.5B-parameter LLMs sufficient for strong empirical gains (Gupta et al., 19 Apr 2026).
Psychometric Cognitive Diagnosis: Text-embedding-informed priors on latent item-attribute matrices stabilize high-dimensional inference and ensure accurate mastery profile recovery in LLM capability evaluation, yielding entrywise $Q$ -matrix recovery exceeding 0.95 in simulation (Liu et al., 16 Mar 2026).
Concept Bottleneck Models: LLM-based concept priors drive rapid recovery of the “true” concept sets and robustness to out-of-distribution data, outperforming boosting and distillation baselines in AUC and uncertainty quantification (Feng et al., 2024).
Conversational and Sequential Decision Policies: LLMs’ conversational priors (i.e., default dialog policies) can be detected as miscalibrated for multi-turn objectives and efficiently re-calibrated by prompt-based or meta-policy selection, enabling improved information-gathering and user satisfaction in recommendation (Herlihy et al., 2024).

4. Integration Architectures and Algorithms

LLM-informed priors can be consumed by downstream systems in several architectures:

Regularization in ERM or GLM: Cross-entropy or $\ell_2$ regularizer with LLM-derived targets penalizes deviation from the prior within ERM or GLM objectives (Choi et al., 2022, Zhu et al., 2023).
Fixed Structural Priors: LLM-generated graphs, affinity matrices, or bottleneck concept sets are fed directly into neural or GNN pipelines without further trainable adaptation; e.g., the adjacency matrix $A_\text{prior}$ completely specifies agent communication in MARL (Gupta et al., 19 Apr 2026).
Priors in Probabilistic/Bayesian Models: Gaussian, Dirichlet, or Exponential priors on parameters are set by aggregating LLM responses, then used in standard Bayesian inference engines (e.g., NUTS, HMC) (Capstick et al., 2024, Arai et al., 4 Sep 2025).
Ensemble Aggregation with Guardrails: Priors are injected into diverse base learners. Model selection and meta-learning (e.g., out-of-fold stacking) calibrate the strength of the prior, with oracle-style guarantees that prevent degradation under poor or adversarial priors (Zhang et al., 29 Jan 2026).
Stochastic Search and Sampling: For complex variable-selection (concept bottleneck) problems, Metropolis-within-Gibbs or multiple-try sampling uses LLM outputs as proposal distributions over concept space, with strong posterior consistency guarantees under mild regularity (Feng et al., 2024).

The role of the LLM prior is strictly as an inductive bias: no gradient or direct signal is propagated to the LLM itself during learning (except in meta-policy tuning settings). In graph-based or Bayesian cases, no learned adaptation of the prior is performed—the prior is queried once per episode/trial and integrated directly.

5. Empirical Results and Quantitative Evaluation

Consistent empirical findings demonstrate that LLM-informed priors:

Enhance low-data performance: Across regression, classification, and MARL settings, LLM priors yield sizable lifts in target metrics when sample size is small, with gains diminishing as $\theta^* = \arg\min_\theta \left[ \mathcal{L}_\text{task}(\theta; D) + \lambda\, \Omega_\text{LLM-prior}(\theta; M) \right],$ 0 increases or as structure is easily discovered from data (Choi et al., 2022, Capstick et al., 2024, Zhu et al., 2023, Gupta et al., 19 Apr 2026).
Accelerate learning and improve sample efficiency: Bayesian models using LLM-informed priors reach target MSE or AUC with less labeled data; e.g., a 20% reduction in required patients for AE event modeling (Arai et al., 4 Sep 2025).
Permit robust task-specific adaptation: LLM priors are especially helpful when semantic or role structure is misaligned with simple heuristics, as under asymmetric or adversarial conditions in MARL (Gupta et al., 19 Apr 2026).
Display ablation sensitivity: Model selection and ablation studies confirm that instruction tuning and LLM quality strongly influence prior informativeness, but even small open-source models ( $\theta^* = \arg\min_\theta \left[ \mathcal{L}_\text{task}(\theta; D) + \lambda\, \Omega_\text{LLM-prior}(\theta; M) \right],$ 1B) are effective in many tasks (Gupta et al., 19 Apr 2026, Zhang et al., 29 Jan 2026).

A sample of reported results:

Task / Metric	Baseline	LLM-informed Prior	Gain
MARL Adversary (mean return)	DCG: +38.8	LLM-prior: +62.9 ± 8.6	+24 points
UTI Prediction (accuracy)	Uninformative	AutoElicit Prior	55% less labels for parity
AE Modeling (patient count)	Meta-analytic	LLM Prior	20% reduction (≈66 patients)
Tabular AUC (N=4 shots)	TabPFN: ~0.70	BiasedLR: ~0.80	+10 points at N=4
Causal Inference (accuracy)	Data only: 58.7%	LMPrior: 83.5%	+24.8 points

6. Limitations, Biases, and Future Directions

Despite empirical successes, several limitations and open challenges remain:

Biases and Prompt Sensitivity: LLM-informed priors inherit and sometimes amplify the biases, calibration issues, or hallucinations of the underlying LLM. This risk motivates prompt design care, cross-validation, and ensemble “guardrails” (Choi et al., 2022, Zhang et al., 29 Jan 2026).
Dependence on Hand-Crafted Mapping: Many frameworks rely on hand-written observation-to-text functions ( $\theta^* = \arg\min_\theta \left[ \mathcal{L}_\text{task}(\theta; D) + \lambda\, \Omega_\text{LLM-prior}(\theta; M) \right],$ 2) or prompt templates. Generalizing to pixel or high-dimensional environments requires learning or adapting these mappings (Gupta et al., 19 Apr 2026).
Limited Prior Adaptivity: Priors are often fixed at inference time, potentially suboptimal in highly dynamic or online contexts. Strategies for prior adaptation—e.g., re-prompting on significant state changes—remain areas for further exploration (Gupta et al., 19 Apr 2026).
Scalability and Cost: Querying large LLMs over many features or subdomains can be computationally intensive. Use of compact open-source models is effective in many applications, but more data-intensive domains may challenge this (Capstick et al., 2024).
Evaluation Boundaries: Most successful deployments occur in low-data, semantically rich, or safety-critical regimes. When data is abundant and simple heuristics already match semantic structure, the utility of LLM priors is reduced (Zhu et al., 2023, Gupta et al., 19 Apr 2026).
Meta-policy Control: For sequential or conversational agents, “conversational priors” may misalign with global task objectives, requiring re-calibration via prompt engineering or learning lightweight meta-policies from logs (Herlihy et al., 2024).

Anticipated future developments include automated observation-to-language mapping, adaptive and hierarchical prompting schemes, distillation of LLM priors into smaller models, and rigorous robustness testing in safety-critical or regulatory settings (Gupta et al., 19 Apr 2026).

LLM-informed priors intersect with several threads in machine learning:

Bayesian Prior Elicitation: LLMs function as “virtual experts,” automating and accelerating the process of expert prior elicitation in Bayesian modeling, especially in the absence of abundant domain-expert effort (Capstick et al., 2024, Arai et al., 4 Sep 2025).
Human-in-the-Loop and Interpretable AI: Their use increases transparency, as priors can be refined via natural language and interpreted post hoc. This feature is exemplified in interpretability-focused applications such as concept bottleneck models and feature selection (Feng et al., 2024, Choi et al., 2022).
Knowledge Distillation and Transfer: Priors distill domain and procedural knowledge embedded in LLMs, without dependence on end-to-end fine-tuning or accessing internal representations (Choi et al., 2022).
Safe Reinforcement Learning and Causal Inference: Task-specific LLM priors provide principled inductive biases for safety-oriented RL and causal discovery, outperforming both data-only and generic Regularization/reward-shaping methods (Choi et al., 2022).
Meta-Learning and Adaptive Ensemble Methods: Adaptive frameworks calibrate or guard against prior misspecification, ensuring “no worse than null” guarantees even in the face of potential LLM hallucination (Zhang et al., 29 Jan 2026).

LLM-informed priors thus offer a formal, scalable pathway for integrating high-level semantic and domain expertise into diverse machine learning workflows, with strong empirical support and principled mechanisms for mitigating the risks of transfer.