Likelihood Factorisation Strategy

Updated 24 April 2026

Likelihood Factorisation (LF) is a method that decomposes high-dimensional joint likelihoods by exploiting conditional independence for scalable computation across statistical models.
LF enables accurate surrogate training, principled marginal likelihood comparisons, and fast simulation-based inference in applications like epidemiology, high-energy physics, and CFD.
LF strategies employ algorithmic optimisations such as tokenised flow matching and group vertex tessellation to reduce simulation costs and achieve significant computational speedups.

The likelihood factorisation (LF) strategy is a mathematical and algorithmic approach for exploiting conditional independence or structural decomposability in statistical models, enabling the joint likelihood to be written as a (typically product) factorisation over subsets of data or variables. This framework appears in hierarchical simulation-based inference, discrete variable modeling, saddlepoint-based approximations, structured missing data, and fast event-rate computation, among other contexts. LF underpins efficient surrogate training, principled marginal likelihood comparison, scalable maximum-likelihood estimation, and fast Monte Carlo reweighting, as evidenced by recent developments across multiple research domains (Charles et al., 22 Apr 2026, LaTorre, 2021, Goodman, 2020, César et al., 2024, Vinci, 2024).

1. Mathematical Foundations of Likelihood Factorisation

The essential principle of LF is to write an otherwise high-dimensional, computationally intractable joint likelihood as a product of lower-dimensional conditionals exploiting conditional independence. Formally, for structured models:

In hierarchical models with global and site-wise parameters, the joint likelihood factorises as

$p(y \mid \theta_g, \eta) = \prod_{s=1}^{n_s} p(y_s \mid \theta_g, \eta_s),$

where $y = (y_1, \ldots, y_{n_s})$ and $\eta = (\eta_1, \ldots, \eta_{n_s})$ are local (Charles et al., 22 Apr 2026).

In categorical data, for a partition $M$ of the $N$ variables, the LF writes

$P(D|p, M) = \frac{N_s!}{\prod_i n_i!} \prod_{g \in M} \prod_{j=1}^{\eta_g} p_{g, j}^{n_{g,j}},$

where $g$ spans the blocks and $n_{g, j}$ are block-wise counts (LaTorre, 2021).

In binned-likelihood settings, predicting expected rates in a bin $i$ is written

$\lambda^i(\theta) = \sum_{C} W_{i,C} \prod_{\alpha \in C} h^\alpha(\theta_\alpha),$

over event configurations $y = (y_1, \ldots, y_{n_s})$ 0 that share dependence on parameters indexed by $y = (y_1, \ldots, y_{n_s})$ 1 (César et al., 2024).

LF frequently enables analytical integration (e.g., Dirichlet-multinomial for categorical models), surrogate training at reduced simulation cost, and the construction of tractable surrogates for otherwise expensive computations.

2. Implementation in Hierarchical Simulation and Surrogate Modeling

Within simulation-based inference for hierarchical models, LF is leveraged to enable scalable likelihood-free Bayesian inference:

Single-site data generation and per-site neural surrogate training are performed using flow-matching networks, training separate conditional models $y = (y_1, \ldots, y_{n_s})$ 2 per site. Approximate full joint likelihoods are then constructed as

$y = (y_1, \ldots, y_{n_s})$ 3

Once the per-site surrogates are trained, synthetic multi-site datasets can be assembled for downstream, amortised inference over the joint posterior with no further multi-site simulator calls.
Tokenised flow matching architectures (TFMPE) generalise this approach to vector/function-valued data by embedding joint variables and site structures as token sequences, facilitating end-to-end training of continuous normalising flows for $y = (y_1, \ldots, y_{n_s})$ 4 (Charles et al., 22 Apr 2026).

LF provides substantial reductions in simulation cost and computational complexity, especially where the number of sites (or data subsets) is large; speedups exceeding three orders of magnitude are empirically demonstrated for realistic scientific models.

3. Model Selection and Marginal Likelihoods in Multivariate Discrete Models

In categorical and discrete data analysis, LF enables rigorous model (partition) selection and predictive inference:

Given $y = (y_1, \ldots, y_{n_s})$ 5 discrete variables, the set of all factorisations into independent blocks is enumerated.
The marginal likelihood for a partition $y = (y_1, \ldots, y_{n_s})$ 6 is analytically computed via closed-form Dirichlet-multinomial integrals,

$y = (y_1, \ldots, y_{n_s})$ 7

where $y = (y_1, \ldots, y_{n_s})$ 8 is the multivariate Beta function and $y = (y_1, \ldots, y_{n_s})$ 9 are block counts (LaTorre, 2021).

Factorisation with highest marginal likelihood is selected, providing an optimal trade-off between blockwise dependence and independent structure.

The resulting Bayes classifier for a new data point factors groupwise predictive probabilities according to the chosen decomposition, outperforming both the Naive Bayes (fully factored) and single-block (fully dependent) models on data where the true dependency structure is intermediate.

4. Computational and Algorithmic Optimisations

LF enables substantial acceleration via algorithmic optimisation and event grouping:

In large-scale binned-likelihood inference (e.g., high-energy physics), LF identifies unique event configurations—groups of simulation events that share identical dependence on parameter subsets—enabling calculation of expected rates with dramatically reduced redundant computation (César et al., 2024).
Efficient pseudocode proceeds in two phases: (1) offline grouping and caching of configurations, (2) online per-theta computation requiring only one evaluation per unique configuration.
In simulation-based inference, LF supports synthetic assembly of multi-site data from pre-trained surrogates, turning an $\eta = (\eta_1, \ldots, \eta_{n_s})$ 0 operation (where $\eta = (\eta_1, \ldots, \eta_{n_s})$ 1 is simulator time per site) into $\eta = (\eta_1, \ldots, \eta_{n_s})$ 2, where $\eta = (\eta_1, \ldots, \eta_{n_s})$ 3 is much smaller.

LF-based methods are crucial for permitting analyses and inference with data or models otherwise infeasible due to cost or memory constraints.

5. Linked Likelihoods in Structured Missing Data and Factor Models

LF generalises to settings with incomplete or structured missing data:

In the context of Gaussian factor models with deterministic missingness, the likelihood for all observations decomposes into a product of partial likelihoods, each conditioned on the available subsets,

$\eta = (\eta_1, \ldots, \eta_{n_s})$ 4

where $\eta = (\eta_1, \ldots, \eta_{n_s})$ 5 encodes shared model parameters (Vinci, 2024).

Maximum likelihood estimation is performed by a blockwise EM where overlapping parameters across partial models are coupled, and blockwise updates are accelerated using the group vertex tessellation (GVT) algorithm.
LF thus enables unified, efficient covariance completion, dimension reduction, and data imputation in high-dimensional incomplete data settings with theoretical convergence guarantees.

This approach is applicable in neuroscience and multi-session studies where not all variable pairs are co-observed, ensuring all entries of the covariance matrix are estimated despite incomplete overlap.

6. Asymptotic Error Control and Saddlepoint Approximations

LF arises in the analysis of the saddlepoint approximation for maximum likelihood estimation:

The saddlepoint likelihood is decomposed as

$\eta = (\eta_1, \ldots, \eta_{n_s})$ 6

where $\eta = (\eta_1, \ldots, \eta_{n_s})$ 7 is the exponential tilt term and $\eta = (\eta_1, \ldots, \eta_{n_s})$ 8 is a Gaussian correction factor (Goodman, 2020).

This factorisation enables precise asymptotic analysis, with the error in the MLE being $\eta = (\eta_1, \ldots, \eta_{n_s})$ 9 in fully identifiable models—negligible compared to statistical uncertainty.
Lower-order (zeroth-order) LF approximations drop the correction $M$ 0 and remain accurate at $M$ 1.

The practical implication is that carefully designed LF-based approximations can yield computationally efficient and statistically precise MLEs under regularity conditions.

7. Applications, Limitations, and Best-Case Scenarios

LF strategies are applicable across diverse domains:

Hierarchical simulation-based inference for epidemiology and CFD, multidimensional discrete classifiers, factor models for high-dimensional incomplete data, and Poisson-binned event-rate estimation in HEP (Charles et al., 22 Apr 2026, LaTorre, 2021, Vinci, 2024, César et al., 2024).
Key limitations include scalability (e.g., exponential number of partitions in discrete LF), necessity of block-independence or groupwise structure, and approximation error where surrogates are substituted for true simulation.
LF delivers optimal results when conditional independence or grouping is strong, the model structure induces tractable factorisations, and moderate to large sample sizes or numbers of sites are present so that computational gains from grouping are most pronounced.

LF underlies scalable modern Bayesian, frequentist, and surrogate-based inference in structured probabilistic models, offering both principled theoretical foundations and substantial empirical speedup.