Additive Mixtures of Markov Kernels

Updated 4 July 2026

Additive Mixtures of Markov Kernels are transition kernels defined as convex or affine combinations that preserve stationarity and often reversibility under shared invariant laws.
They interpolate between local exploration and averaging components (e.g., Gibbs, Metropolis, or Barker kernels) to balance sampling efficiency and overcome bottlenecks.
Optimization over mixing parameters and partitions uses metrics like Frobenius norm and KL divergence to rigorously trade off local dynamics and global regularization.

Searching arXiv for the cited papers on additive mixtures of Markov kernels and related kernel-averaging constructions. Additive mixtures of Markov kernels are transition kernels formed by convex or affine combination of other kernels, typically to combine complementary dynamical mechanisms such as local exploration, orbit-wise averaging, restart regularization, or symmetry transforms. In the finite-state setting studied most directly in "On additive averaging kernels for finite Markov chains" (Lim et al., 14 Apr 2026), the canonical form is

$A_\alpha=\alpha P+(1-\alpha)G,\qquad \alpha\in[0,1],$

where $P$ is a baseline sampler and $G$ is a Gibbs kernel induced by a partition of the state space. Across the literature, closely related constructions include convex mixtures of reversible kernels with common invariant distribution (Lee et al., 2012), state-dependent local aggregation $\sum_i \varpi_i(x)P_i(x,\cdot)$ together with an acceptance correction restoring $\pi$ -reversibility (Maire et al., 2018), restart-anchored mixtures $(1-\varepsilon)K+\varepsilon \nu$ used as invertible coordinates for learning transition kernels (Xu, 1 Jun 2026), orbit-averaged hybrid kernels $\alpha P+(1-\alpha)Q$ with $Q\in\{G,M,B\}$ (Choi et al., 15 Dec 2025), and group averages of the form $\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ (Choi et al., 3 Sep 2025). The common theme is that additive mixing preserves kernel validity under mild compatibility conditions, while the effect on convergence, variance, and optimization depends sharply on the structure of the components.

1. Canonical constructions and formal definitions

In the finite-state framework of (Lim et al., 14 Apr 2026), the state space is $\mathcal X=\llbracket n\rrbracket$ , the stationary distribution $P$ 0 is strictly positive, and the additive kernel is built from a baseline $P$ 1-stationary kernel $P$ 2 and a partition

$P$ 3

The associated Gibbs kernel is

$P$ 4

so one $P$ 5-step exactly resamples from $P$ 6 restricted to the current block. The additive mixture

$P$ 7

therefore interpolates between pure within-block averaging and the original sampler (Lim et al., 14 Apr 2026).

A distinct but structurally related class appears in the group-averaging literature. In "Group-averaged Markov chains II: tuning of group action in finite state space" (Choi et al., 15 Dec 2025), additive orbit-averaged kernels are written as

$P$ 8

with $P$ 9 the Gibbs orbit kernel and $G$ 0 the Metropolis–Hastings and Barker orbit kernels. The equal-weight cases

$G$ 1

are the paper’s basic additive hybrids.

The most general convex-mixture theorem in the supplied corpus is the reversible-kernel result of (Lee et al., 2012). There, one considers

$G$ 2

where each $G$ 3 is reversible with invariant distribution $G$ 4. The binary special case is

$G$ 5

This establishes additive mixtures as a native object of reversible Markov-chain theory rather than a construction tied to a particular sampler family (Lee et al., 2012).

A different extension replaces constant weights by state-dependent weights. In (Maire et al., 2018), the naive local mixture is

$G$ 6

where $G$ 7. Because this generally fails to preserve $G$ 8-invariance, the corrected kernel is

$G$ 9

with

$\sum_i \varpi_i(x)P_i(x,\cdot)$ 0

This construction is still additive in the kernel index, but the weights and correction now depend on the current state (Maire et al., 2018).

Another structured additive mixture is the restart anchor of (Xu, 1 Jun 2026): $\sum_i \varpi_i(x)P_i(x,\cdot)$ 1 with $\sum_i \varpi_i(x)P_i(x,\cdot)$ 2 a state-independent restart law and $\sum_i \varpi_i(x)P_i(x,\cdot)$ 3. This mixture is used simultaneously as a valid Markov kernel, a Doeblin-minorized version of $\sum_i \varpi_i(x)P_i(x,\cdot)$ 4, and an invertible coordinate chart for learning transition laws (Xu, 1 Jun 2026).

2. Stationarity, reversibility, and lifted or projected interpretations

Additive mixtures preserve stationarity whenever the components share the same invariant law. In (Lim et al., 14 Apr 2026), both $\sum_i \varpi_i(x)P_i(x,\cdot)$ 5 and the Gibbs kernel $\sum_i \varpi_i(x)P_i(x,\cdot)$ 6 are $\sum_i \varpi_i(x)P_i(x,\cdot)$ 7-stationary, so $\sum_i \varpi_i(x)P_i(x,\cdot)$ 8 is again $\sum_i \varpi_i(x)P_i(x,\cdot)$ 9-stationary; if both are reversible, the mixture remains reversible. This reversibility-preserving feature distinguishes additive averaging from many lifting constructions, even though the same paper gives a lifted interpretation of $\pi$ 0 on

$\pi$ 1

The lifted kernel $\pi$ 2 first samples a sign with

$\pi$ 3

then applies $\pi$ 4 or $\pi$ 5 accordingly. Projecting onto the first coordinate recovers exactly the additive mixture $\pi$ 6 (Lim et al., 14 Apr 2026). This identifies additive averaging as the observable component of a lifted chain with a hidden switch between local and averaging moves.

The inheritance theorem of (Lee et al., 2012) shows that under reversibility, a positive-weight “good” component can rescue a mixture. If $\pi$ 7 has unique invariant distribution $\pi$ 8 and $\pi$ 9, then

$(1-\varepsilon)K+\varepsilon \nu$ 0

where $(1-\varepsilon)K+\varepsilon \nu$ 1 denotes variance bounding and $(1-\varepsilon)K+\varepsilon \nu$ 2 geometric ergodicity. The proof is conductance-based and gives the explicit lower bounds

$(1-\varepsilon)K+\varepsilon \nu$ 3

The result is one-way: it does not assert that all components must themselves be variance bounding or geometrically ergodic (Lee et al., 2012).

Group-averaging theory provides another invariant-structure interpretation. In (Choi et al., 3 Sep 2025), a group $(1-\varepsilon)K+\varepsilon \nu$ 4 acts on the state space and one averages transformed kernels: $(1-\varepsilon)K+\varepsilon \nu$ 5 When $(1-\varepsilon)K+\varepsilon \nu$ 6 is $(1-\varepsilon)K+\varepsilon \nu$ 7-invariant, these averages preserve $(1-\varepsilon)K+\varepsilon \nu$ 8-stationarity; under a symmetry condition on $(1-\varepsilon)K+\varepsilon \nu$ 9, they also preserve reversibility. Special cases include the left-average $\alpha P+(1-\alpha)Q$ 0, right-average $\alpha P+(1-\alpha)Q$ 1, orbit average $\alpha P+(1-\alpha)Q$ 2, and independent double average $\alpha P+(1-\alpha)Q$ 3 (Choi et al., 3 Sep 2025). This places additive mixtures inside a broader invariant-kernel geometry based on symmetry transforms rather than on partitions alone.

A common misconception is that any state-dependent convex combination of reversible kernels remains invariant. The finite-state counterexamples in (Maire et al., 2018) show this is false: the naive kernel $\alpha P+(1-\alpha)Q$ 4 may fail to be $\alpha P+(1-\alpha)Q$ 5-invariant, and the paper states that one can even construct cases where it is transient. The correction factor $\alpha P+(1-\alpha)Q$ 6 is therefore not optional but structural (Maire et al., 2018).

3. Objective functions for additive-kernel design

The most detailed optimization theory in the supplied material concerns one-step distance to stationarity for $\alpha P+(1-\alpha)Q$ 7 (Lim et al., 14 Apr 2026). Two objectives are analyzed: a weighted squared Frobenius discrepancy to the stationary limit kernel $\alpha P+(1-\alpha)Q$ 8, and a $\alpha P+(1-\alpha)Q$ 9-weighted row-wise Kullback–Leibler divergence.

For the Frobenius criterion, the basic quantity is

$Q\in\{G,M,B\}$ 0

If $Q\in\{G,M,B\}$ 1 and $Q\in\{G,M,B\}$ 2 is the Gibbs kernel induced by the partition, then the interaction trace compresses to the projection chain $Q\in\{G,M,B\}$ 3: $Q\in\{G,M,B\}$ 4 Under reversibility,

$Q\in\{G,M,B\}$ 5

For $Q\in\{G,M,B\}$ 6,

$Q\in\{G,M,B\}$ 7

Thus, once $Q\in\{G,M,B\}$ 8, $Q\in\{G,M,B\}$ 9, and the number of blocks are fixed, all partition dependence enters through $\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ 0 (Lim et al., 14 Apr 2026).

For a two-block partition $\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ 1, the same paper derives

$\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ 2

so minimizing the Frobenius objective is equivalent to maximizing the Cheeger-type functional

$\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ 3

This reverses the role of the classical bottleneck-seeking Cheeger problem: Frobenius-optimal additive partitions cut through regions already well connected under $\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ 4, while bad partitions align with bottlenecks (Lim et al., 14 Apr 2026).

The KL objective uses

$\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ 5

For the lifted chain,

$\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ 6

an exact convex decomposition. For the additive chain itself,

$\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ 7

by convexity of $\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ 8. The Gibbs term is explicit: $\mathbb E_{(g,h)\sim \nu}(U_gPU_h)$ 9 where $\mathcal X=\llbracket n\rrbracket$ 0 and

$\mathcal X=\llbracket n\rrbracket$ 1

Hence, under the KL bound, partition selection reduces entirely to block-mass entropy, not to the transition geometry of $\mathcal X=\llbracket n\rrbracket$ 2 (Lim et al., 14 Apr 2026).

A related but distinct information-geometric picture appears in the group-averaging papers. In (Choi et al., 15 Dec 2025), the Gibbs sandwich $\mathcal X=\llbracket n\rrbracket$ 3, not the additive mixture $\mathcal X=\llbracket n\rrbracket$ 4, satisfies the exact Pythagorean identity

$\mathcal X=\llbracket n\rrbracket$ 5

for $\mathcal X=\llbracket n\rrbracket$ 6 in the $\mathcal X=\llbracket n\rrbracket$ 7-invariant class, making $\mathcal X=\llbracket n\rrbracket$ 8 the unique KL projection of $\mathcal X=\llbracket n\rrbracket$ 9 onto that class. Likewise, in the general-state-space group-average framework of (Choi et al., 3 Sep 2025), the averaged kernel $P$ 00 is the unique KL projection of $P$ 01 onto the fixed-point set $P$ 02, with an exact Pythagorean identity. These results concern additive averages of transformed kernels, but they do not imply the same KL-projection property for arbitrary binary mixtures such as $P$ 03.

4. Spectral, conductance, and asymptotic-variance behavior

The spectral behavior of additive mixtures is subtle and depends on the metric under study. In (Lim et al., 14 Apr 2026), assuming $P$ 04, the second-largest eigenvalue modulus is

$P$ 05

with absolute spectral gap

$P$ 06

The additive mixture satisfies

$P$ 07

and therefore

$P$ 08

This gives geometric contraction in the weighted Frobenius norm, with larger $P$ 09 strengthening the contraction bound because only $P$ 10 contributes cross-block irreducibility (Lim et al., 14 Apr 2026).

By contrast, the finite-state group-averaging paper (Choi et al., 15 Dec 2025) proves that for additive orbit mixtures

$P$ 11

the orbit-space projection chain is

$P$ 12

the orbit restriction chains are

$P$ 13

and the cross-orbit escape parameter scales as

$P$ 14

Moreover,

$P$ 15

These formulas show that additive orbit averaging improves within-orbit relaxation while lazifying the projection chain on orbit space. The paper then states explicitly that there is no uniform ordering of the right spectral gap of $P$ 16 and that of $P$ 17 (Choi et al., 15 Dec 2025). This is an important counterweight to the intuition that “adding averaging always improves mixing.”

The same contrast arises with asymptotic variance. The strongest monotonicity theorems in (Choi et al., 15 Dec 2025) apply to multiplicative orbit sandwiches $P$ 18, especially $P$ 19, not to additive mixtures. For additive hybrids $P$ 20, the paper proves structural decompositions but no general Peskun-type or asymptotic-variance dominance theorem. A plausible implication is that additive mixtures are easier to analyze componentwise than to order globally.

The reversible-mixture theorem of (Lee et al., 2012) addresses a different spectral regime. Rather than improving a gap quantitatively, it proves qualitative inheritance: a positive-weight variance-bounding or geometrically ergodic component suffices to make the whole reversible mixture variance bounding or geometrically ergodic. The mechanism is conductance domination rather than eigenvalue interlacing (Lee et al., 2012).

In the general-state-space symmetry-averaging framework (Choi et al., 3 Sep 2025), additive averages of transformed kernels enjoy a stronger monotonicity result for the multiplicative spectral gap: $P$ 21 The proof relies on operator-norm convexity: $P$ 22 Among such double averages, $P$ 23 is optimal in this multiplicative-gap sense (Choi et al., 3 Sep 2025). This result is about additive averages over symmetry transforms, not about arbitrary binary kernel mixtures, but it shows that additional group structure can restore a clean monotonicity theorem.

5. Optimization over partitions, weights, and state-dependent rules

Partition selection for $P$ 24 is combinatorial, and (Lim et al., 14 Apr 2026) develops several approximation principles. For two-block cuts, the authors define

$P$ 25

and show that on $P$ 26,

$P$ 27

$P$ 28

then $P$ 29 maximizes $P$ 30 over $P$ 31, yielding the additive guarantee

$P$ 32

Since $P$ 33, this gives a uniform singleton approximation bound (Lim et al., 14 Apr 2026).

The same paper then rewrites the exact objective as a difference of supermodular functions, equivalently a difference of submodular functions after sign changes. This makes the partition problem amenable to majorization–minimization. Using convexity of $P$ 34, the objective is upper-bounded by a supermodular majorizer $P$ 35, and the MM update

$P$ 36

guarantees monotonic descent of the Frobenius objective (Lim et al., 14 Apr 2026).

Optimization over the mixing parameter $P$ 37 is less explicit in general, but in the special case $P$ 38 one has

$P$ 39

with unique minimizer

$P$ 40

This exhibits an interior optimum and formalizes the trade-off between local exploration and averaging (Lim et al., 14 Apr 2026).

State-dependent mixtures require a different design problem. In (Maire et al., 2018), the selection rule $P$ 41 is intended to privilege the kernel aligned with local target geometry. The paper proposes heuristics such as

$P$ 42

with the preferred choice $P$ 43, and in random-walk settings estimates these weights by particle probing: $P$ 44 To prevent metastability from overconcentrated local weights, the modified rule

$P$ 45

imposes a lower bound on all kernel-selection probabilities (Maire et al., 2018). This is optimization by local relevance rather than by a global one-step discrepancy objective.

Restart-anchored mixtures introduce yet another tuning parameter, $P$ 46. The anchor map

$P$ 47

improves conditioning because

$P$ 48

but inversion amplifies estimation error by $P$ 49: $P$ 50 Thus larger anchor strength yields a stronger lower envelope and better contrastive conditioning, while smaller anchor strength makes de-anchoring more stable (Xu, 1 Jun 2026).

6. Applications, empirical behavior, and broader variants

The empirical study in (Lim et al., 14 Apr 2026) uses the Curie–Weiss model on

$P$ 51

with Glauber dynamics as baseline $P$ 52. For the magnetization-sign partition

$P$ 53

the paper compares $P$ 54, $P$ 55, $P$ 56, and

$P$ 57

Across the reported regimes, the total-variation ranking is

$P$ 58

from fastest to slowest mixing. Additive mixtures therefore improve over the baseline but are weaker than multiplicative group averaging per kernel application. The paper also observes that the Frobenius-optimal cuts for additive mixtures are more balanced than those for $P$ 59 or $P$ 60, especially at high temperature (Lim et al., 14 Apr 2026).

Varying $P$ 61 for the same fixed partition yields a U-shaped empirical dependence of fixed-time total variation distance on $P$ 62. The extremes fail for opposite reasons: $P$ 63 At $P$ 64, the Gibbs kernel is reducible unless the partition is trivial; at $P$ 65, the baseline local chain may mix slowly. Intermediate values such as $P$ 66 and $P$ 67 perform best in the experiments, usually at or above $P$ 68 (Lim et al., 14 Apr 2026).

State-dependent local aggregation in (Maire et al., 2018) targets noise-vanishing and filamentary distributions. In the discrete hypercube filament example, the corrected locally weighted mixture improves coupling and spectral quantities relative to the best constant-weight random scan: $P$ 69 and

$P$ 70

The paper simultaneously emphasizes a caveat: for $P$ 71, overly aggressive local weights can worsen asymptotic mixing because the chain escapes high-mass filaments too rarely (Maire et al., 2018).

In transition-kernel learning, the restart mixture of (Xu, 1 Jun 2026) is used not as a faster sampler but as a coordinate chart. The anchored density

$P$ 72

is identifiable by a contrastive risk, after which de-anchoring

$P$ 73

may produce a signed or unnormalized object. The Markovization operator

$P$ 74

restores kernel validity and satisfies

$P$ 75

This suggests a broader role for additive mixtures as statistical regularizers rather than solely as sampling accelerators (Xu, 1 Jun 2026).

Group-averaging on general state spaces extends the notion of additive mixtures from binary combinations to averages over transformed kernels. In (Choi et al., 3 Sep 2025), the family

$P$ 76

includes group-orbit averages, left/right averages, and independent double averages. The paper shows that these kernels often improve spectral gap and asymptotic variance, and it recasts algorithms such as HMC, PDMPs, Swendsen–Wang, and parallel tempering inside this framework. This suggests that additive mixtures become especially powerful when they are organized by symmetry and admit projection identities (Choi et al., 3 Sep 2025).

A separate, measure-theoretic use of additive decomposition appears in finitely additive Markov chains (Zhdanok et al., 2021). On discrete spaces with full power-set sigma-algebra, every finitely additive kernel decomposes uniquely as

$P$ 77

with countably additive and purely finitely additive components. In the “combined” class with fixed masses

$P$ 78

this becomes a convex mixture after normalization,

$P$ 79

The asymptotic and invariant-measure behavior is then governed by the interaction between the countably additive and purely finitely additive sectors (Zhdanok et al., 2021). This is a different use of additive mixing than in MCMC, but it broadens the concept beyond standard stochastic kernels.

Additive mixtures of Markov kernels therefore span several technically distinct regimes: reversible convex combinations with conductance inheritance (Lee et al., 2012), partition-based hybrids balancing exploration and averaging (Lim et al., 14 Apr 2026), state-dependent locally weighted mixtures requiring correction for invariance (Maire et al., 2018), restart mixtures enabling contrastive learning and Doeblin minorization (Xu, 1 Jun 2026), and symmetry-organized averages with projection and gap-improvement structure (Choi et al., 3 Sep 2025). A recurrent lesson is that additive combination by itself guarantees relatively little beyond validity and stationarity; the stronger conclusions arise from additional structure such as reversibility, common invariant law, orbit geometry, or group symmetry.