LLM-Informed Prior Distributions

Updated 5 September 2025

LLM-informed prior distributions are Bayesian priors derived from large language model outputs, integrating expert and contextual knowledge into statistical modeling.
They utilize methodologies like parametric elicitation, hyperprior construction, and mixture modeling to effectively incorporate linguistic insights into Bayesian frameworks.
Empirical evaluations reveal enhanced sample efficiency, predictive accuracy, and robust inference across applications such as clinical trials and zero-shot learning.

LLM-Informed Prior Distributions are Bayesian priors whose specification or parameters are directly or indirectly derived from the outputs, domain knowledge, or metadata produced by LLMs. This paradigm enables incorporation of linguistic, expert, and contextual information into statistical models, especially in settings where classic expert elicitation is infeasible, where observational data alone are insufficiently informative, or where domain expertise is latent in large-scale textual corpora. LLM-informed priors can be constructed by querying LLMs with natural language prompts, by extracting predictive summaries, or by conditioning model parameters on contextual signals, thereby enabling inductive bias and enhancing sample efficiency in both classical and modern machine learning regimes.

1. Principles and Mathematical Formulation

The construction of LLM-informed prior distributions begins with explicitly formalizing how the LLM output relates to prior specification. In the general setting, suppose $\theta$ denotes the vector of model parameters and $y$ denotes observed or target data. An LLM can be used to generate relevant prior information in several modes:

Parametric elicitation: Query the LLM (via prompt engineering) for point estimates or distributional summaries (mean, standard deviation, quantiles) for $\theta$ or for hyperparameters. These values are then used to define a prior, e.g., $\theta \sim \mathcal{N}(\mu_{LLM}, \sigma_{LLM}^2)$ .
Prior for hyperparameters: For hierarchical models, LLM-derived summaries specify hyperpriors. For instance, $\alpha \sim \text{Exponential}(\lambda_{LLM})$ , where $\lambda_{LLM}$ is output by the LLM, typically via a prompt for expert-informed rate parameters (Arai et al., 4 Sep 2025).
Mixture-of-experts or mixture priors: Multiple queries to the LLM (using paraphrased prompts) are used; the set of resulting estimates $\{(\mu_k, \sigma_k)\}_{k=1}^K$ are aggregated into a mixture prior:

$p_{LLM}(\theta) = \sum_{k=1}^K \pi_k \mathcal{N}(\theta | \mu_k, \sigma_k^2)$

where $\pi_k$ are mixture weights, frequently Dirichlet distributed (Capstick et al., 26 Nov 2024).

In certain frameworks, LLM-informed priors are further contextualized into probabilistic models via transformative procedures (e.g., prior normalization (Cui et al., 2022), translation of predictive distributions (Manderson et al., 2023)) or by direct linkage between language metadata and learning components (feature selection, reward shaping) (Choi et al., 2022).

2. Elicitation Procedures and Prompt Design

LLM-informed prior construction critically depends on prompt engineering and interaction design:

Direct numerical elicitation: Prompts are crafted in natural language to ask the LLM for domain-relevant parameter values, ranges, or distributions. For hierarchical Bayesian models, LLMs are queried to provide rate parameters for hyperpriors (e.g., $\alpha$ and $\beta$ in a Gamma prior for Poisson or Negative Binomial modeling) (Arai et al., 4 Sep 2025). Disease-informed or context-free prompts are possible, with outputs aggregated over different temperatures and prompt variants to ensure robustness.
Contextual prompting for structured domains: Compositional zero-shot learning frameworks prompt LLMs to generate sentence-level contextual descriptions for each class or attribute, which are then encoded (often as embeddings via CLIP text encoders) and used to induce a Gaussian class prior (Bao et al., 2023).
Multiple prompt aggregation: AutoElicit generates many paraphrased versions of a task description to the LLM, each yielding distinct prior estimates, which are pooled into a flexible mixture model for predictive modeling (Capstick et al., 26 Nov 2024).
Expert knowledge via language metadata: LMPriors proposes the use of task metadata (variable names, contextual sentences) in prompts, with the LLM’s output interpreted as soft probabilities for inclusion or penalization, which are then used in downstream learning decisions (Choi et al., 2022).

Prompt sensitivity, selection, and calibration are non-trivial; empirical studies indicate that temperature averaging, disease annotation, and paraphrasing can all impact the informativeness and variability of the resulting prior (Arai et al., 4 Sep 2025, Capstick et al., 26 Nov 2024).

3. Integration into Bayesian Modeling Frameworks

LLM-informed prior distributions are deployed in a variety of Bayesian inferential architectures:

Hierarchical Bayesian models: Hyperpriors derived from LLM outputs are used for site-level parameters (e.g., in adverse event modeling for clinical trials: $y_{ij} \sim \mathrm{Poisson}(\lambda_j)$ , $\lambda_j \sim \mathrm{Gamma}(\alpha, \beta)$ , $\alpha\sim\mathrm{Exp}(\lambda_\alpha^{LLM})$ ) (Arai et al., 4 Sep 2025).
Mixture models and clustering: An asymmetric Dirichlet prior is tailored for mixture weights in finite mixture models, enabling direct control over the number of clusters via LLM-informed hyperparameters; this elicits user-chosen or LLM-suggested values for the expected number of occupied clusters $K^+$ (Page et al., 2023).
Regression and predictive modeling: AutoElicit constructs priors for interpretable linear model coefficients using a mixture of LLM-sourced Gaussians, allowing rapid learning with limited labeled data and providing sample-efficient error reduction (Capstick et al., 26 Nov 2024).
Compositional and multi-modal learning: In compositional zero-shot learning (CZSL), class prior distributions are formed by encoding LLM-generated sentences, fusing language-informative features with visual embeddings, and leveraging attention-based modules for primitive decomposition (Bao et al., 2023).
Model space specification: LLM-informed priors impact not only parameter distributions but also model selection; the adjustment to the prior on model space (e.g., $f(m)\propto p(m)c_m^{d_m}$ ) neutralizes the over-penalization from diffuse parameter priors and can be generalized to contexts where LLMs specify more or less informative priors (Dellaportas et al., 2012).

These frameworks frequently exhibit enhancements in sample efficiency, inference stability, and predictive validity, contingent on the informativeness of the LLM-derived prior and its calibration to the data.

4. Theoretical Properties, Robustness, and Calibration

Theoretical analysis of LLM-informed priors centers around the interplay between prior informativeness, model calibration, and robustness to prior-data conflict:

Data-dependent mixture weighting: In mixture data-dependent priors, the relative weight of the LLM-driven versus baseline prior is guided by a resampling procedure that computes distances (e.g., Hellinger) between likelihoods; when there is prior-data conflict, the mixture shifts toward the noninformative baseline to safeguard against overdominance (Egidi et al., 2017).
Adjustment for Lindley’s paradox: Joint specification of parameter and model space priors, with the model prior boosted by the dispersion term $c_m^{d_m}$ , renders model selection robust to arbitrary choices of prior scale, a result generalizable to highly informative or diffuse LLM-sourced priors (Dellaportas et al., 2012).
Translation of predictive distributions: When only predictive or observable priors are available—either from expert elicitation or LLM predictions—global Bayesian optimization is used to find hyperparameter settings in a flexible parametric family, minimizing discrepancy from the LLM-elicited predictive target and balancing faithfulness with prior spread (Manderson et al., 2023).
Robustness and transfer mechanisms: In ensemble judgment or sparse data regimes, mixture models (Beta-Binomial) and prior transfer via embedding similarity allow LLM-informed priors learned from one dataset to efficiently inform uncertainty quantification in another, minimizing required labeled samples and providing theoretical bounds on estimation error (Qu et al., 17 Apr 2025).
Information-geometric interpretation: In power prior frameworks, the influence of LLM-driven historical data can be adaptively calibrated by weighting and by generalizing beyond KL divergence to Amari’s $\alpha$ -divergence, resulting in a posterior that lives along a geodesic in the statistical manifold, offering robustification and geometric insight (Kimura et al., 22 May 2025).

Empirical analyses validate these theoretical guarantees, showing that proper mixing, calibration, and adjustment render LLM-informed priors robust to both misspecification and underspecification.

5. Empirical Performance and Practical Applications

The deployment of LLM-informed prior distributions leads to quantifiable improvements in performance, efficiency, and applicability across domains:

Sample efficiency and predictive accuracy: In clinical trial safety modeling, LLM-informed hyperpriors enabled substantial reduction in required patient numbers for equivalent statistical power, providing more ethical and cost-effective trial design (Arai et al., 4 Sep 2025).
Feature selection and causal inference: LLM-driven priors in high-dimensional datasets facilitated relevant variable selection and improved causal direction inference, supplementing or surpassing data-driven approaches in accuracy and interpretability (Choi et al., 2022).
Compositional zero-shot generalization: Models using LLM-supported class distributions demonstrated superior harmonic mean accuracies and generalization to unseen compositions compared to traditional prompt and distribution-based methods (Bao et al., 2023).
Clinical predictive modeling: Mixture-of-Gaussians priors constructed by AutoElicit led to rapid learning curves and significant reductions in annotation requirements in healthcare applications (e.g., urinary tract infection detection) (Capstick et al., 26 Nov 2024).
Causal structure learning: Mitigation strategies for LLM-derived prior errors in Bayesian networks—based on detection of quasi-circles—enhanced resilience against order-reversed prior mistakes while preserving most correct edge priors (Chen et al., 2023).
Sparse and structured inference: Prior normalization enabled the use of heavy-tailed or sparsity-inducing priors (potentially informed by LLMs) within efficient MCMC frameworks for inverse problems, ensuring both computational tractability and theoretical convergence (Cui et al., 2022).

A table summarizing selected frameworks and their empirical impact:

Framework	Empirical Outcome	Domain
Hierarchical LLM priors	20% sample reduction, better LPD	Clinical
AutoElicit	55% reduction in labels, early accuracy	Healthcare
LMPriors	Feature selection accuracy gains	ML
PLID (CZSL)	Highest mean/AUC on CZSL benchmarks	Vision
Data-dependent mixture	MSE reduction, robust small-sample behavior	Bayesian

6. Limitations, Challenges, and Future Directions

LLM-informed prior distributions, while versatile, require careful handling:

Prompt sensitivity and bias propagation: Outputs are highly sensitive to prompt design, temperature setting, and context; LLMs can propagate dataset, cultural, or user biases into priors, necessitating external calibration or mitigation strategies (Choi et al., 2022, Arai et al., 4 Sep 2025).
Non-uniqueness and replicability: Translation from predictive to joint priors may yield multiple hyperparameter sets that provide practically indistinguishable predictive behavior, complicating reproducibility and interpretation (Manderson et al., 2023).
Computational scaling: Certain procedures, such as prior optimization or mixture prior construction, incur substantial computational costs relative to classical inference; scalability remains an open challenge (Manderson et al., 2023, Cui et al., 2022).
Complexity of error mitigation: For causal structure learning, robust frameworks are needed to distinguish order-preserving from order-reversing prior errors and to analytically correct for LLM misdirection (Chen et al., 2023).
Extension to new domains and modalities: While vision, language, and tabular ML tasks have benefited, extensions to multivariate time series, spatial models, or reinforcement learning regimes require further methodological adaptation (Bao et al., 2023).

Continued research into more trustworthy, transparent, and context-sensitive mechanisms for prior elicitation—as well as calibration with domain experts, prior transferability, and integration with hierarchical Bayesian architectures—is warranted.

7. Significance and Outlook

LLM-informed prior distributions fundamentally expand the toolbox of Bayesian modeling by enabling the injection of rich, context-dependent, expert-level domain knowledge using large-scale pretrained LLMs. This capability is especially salient in low-data environments, complex hierarchical models, and compositional tasks. By marrying linguistic context with numerical inference, these priors offer quantifiable improvements in efficiency, accuracy, and interpretability without sacrificing theoretical rigor. Ongoing developments in transfer learning, prompt calibration, mixture modeling, and robust geometric inference continue to shape the practical landscape of LLM-informed Bayesian inference across scientific domains.