Von Mises-Fisher Sampling

Updated 5 August 2025

Von Mises-Fisher sampling refers to efficient methods for generating random samples from the vMF distribution on the unit sphere, fundamental for directional statistics and various applications.
Algorithms utilize shifted gamma proposals, rejection sampling, and asymptotic approximations to achieve high throughput and precision even in high-dimensional settings.
vMF sampling underpins practical applications in machine learning, signal processing, molecular modeling, and reinforcement learning, with extensions in mixture modeling and quantum algorithms.

Von Mises-Fisher sampling refers to a family of methods for efficiently generating random samples from the von Mises-Fisher (vMF) distribution, a central model for directional data defined on the surface of the unit sphere or hypersphere in ℝ^d. This distribution is fundamental in directional statistics, machine learning, signal processing, molecular modeling, quantum information science, and reinforcement learning, wherever probabilistic modeling of directions or orientations is required.

1. The von Mises-Fisher Distribution and Its Role in Sampling

The vMF distribution is the canonical exponential-family distribution for data on the unit sphere $\mathcal{S}^{p-1}$ . Its probability density function has the form

$f(x \mid \mu, \kappa) = C_p(\kappa) \exp\bigl(\kappa\, \mu^\top x\bigr) ,$

where $x, \mu \in \mathcal{S}^{p-1}$ , $\kappa \geq 0$ is the concentration parameter, and $C_p(\kappa) = \frac{\kappa^{p/2-1}}{(2\pi)^{p/2} I_{p/2-1}(\kappa)}$ is the normalization constant involving the modified Bessel function of the first kind. As $\kappa \rightarrow 0$ , the vMF approaches the uniform distribution; as $\kappa \rightarrow \infty$ , it concentrates around the mean direction μ.

Sampling from the vMF distribution is required both in direct simulation (e.g., for generative models, simulation studies, or diffusion calculations) and in inference schemes such as Bayesian posterior sampling, mixture model EM, and variational inference.

2. Exact and Efficient Sampling Algorithms

2.1. Sampling in the Canonical (Circular) Case: Posterior for the Concentration Parameter

For the von Mises distribution on the circle, the posterior for the concentration parameter given a conjugate prior and data takes the form

$p(\kappa) \propto [I_0(\kappa)]^{-\gamma} \exp(-\gamma\eta_0 \kappa).$

Fast, exact sampling from this so-called "Bessel exponential" density was long problematic due to the intractable normalizing constants and the numerically challenging behavior of Bessel functions at small values.

The method described in ["A Fast Algorithm for Sampling from the Posterior of a von Mises distribution" (Forbes et al., 2014)] provides a highly efficient rejection sampler based on:

Gamma-based proposal: Utilizes the large- $\kappa$ asymptotic $I_0(\kappa) \approx \exp(\kappa)/\sqrt{2\pi\kappa}$ to approximate the target by a shifted gamma density.
Shifted gamma envelope: Proposals have the form $\kappa = x - \epsilon$ , with $x \sim \text{Gamma}(shape, rate)$ .
Enveloping and acceptance: The envelope density is designed such that the acceptance ratio

$\frac{p(\kappa)}{q(\kappa; \alpha, \beta, \epsilon)} \leq 1,$

with optimized envelope parameters $\alpha, \beta, \epsilon$ computed from problem constants, involving analytic approximations for typical convoluted optimization steps (e.g., using Winitzki's approximation for the Lambert W function for the shift ε).

Algorithmic summary:

1. Compute envelope parameters as analytical functions of η, β₀. 2. Draw $x \sim \text{Gamma}(\text{shape} = \eta\alpha+1, \text{rate} = \eta\beta)$ , truncated to $x \geq \epsilon$ . 3. Let $\kappa = x - \epsilon$ . 4. Accept with probability proportional to $\exp\{\eta[g(\kappa; \alpha, \beta, \epsilon) - g(\kappa_0; \alpha, \beta, \epsilon)]\}$ (specifics in the full formula).

Acceptance rates are consistently high (≥70%), with computational throughput in compiled code exceeding one million samples per second (Forbes et al., 2014).

2.2. The General (High-Dimensional) Case

For general $d$ -variate vMF sampling, prominent methods include Wood's algorithm (Kasarapu et al., 2015):

Sample the "height" (the projection along μ) from the correct marginal density using either direct inversion or rejection sampling.
Sample a uniform vector on the orthogonal sphere, then concatenate and rotate to align with the intended mean direction.
This works efficiently even in high dimensions due to the availability of reliable routines and well-behaved numerical properties of the vMF marginals.

For high concentration or high-dimensional cases, asymptotic approximations to the normalization constant $C_p(\kappa)$ can be leveraged to further accelerate sampling (Wei et al., 2015). For large κ: $C_d(\kappa) \approx \kappa^{d/2-0.5} / \bigl( (2\pi)^{(d-1)/2} e^{\kappa} \bigr),$ whereby computationally expensive Bessel evaluations are avoided.

2.3. Mixtures of vMF: EM and SGD Methods

In clustering and mixture modeling (e.g., for text vectors, i-vectors in speaker diarization, or protein-backbone orientations), it is common to use Expectation-Maximization or SGD to fit mixtures of vMF distributions (Dubey et al., 2018, Kim, 2021).
Efficient sampling is needed for both E-step simulation (assignment of cluster responsibility/weighting) and M-step estimation, typically involving hard or soft assignment, prototype ("mean direction") calculation, and concentration parameter updates via fixed-point iterations.

3. Specialized and Emerging vMF Sampling Applications

3.1. Bayesian and MML Estimation

Bayesian inference frequently requires sampling from posterior distributions of vMF parameters or model components. Minimum Message Length (MML) estimators for vMF parameters use iterative techniques (Newton or Halley methods) based on sample resultant lengths, and coupling these with efficient vMF samplers is essential for mixture model selection and model averaging (Kasarapu et al., 2015).

3.2. Graph Neural Networks and Molecular Conformation

Machine learning for molecular conformation generation has adopted variational mixture models of the von Mises distribution for modeling torsion angle marginals, allowing extremely efficient generation of physically accurate samples with explicit mixture-based vMF sampling (Swanson et al., 2023).

3.3. Reinforcement Learning and Hyperspherical Exploration

In RL with large action sets and hyperspherical embeddings, vMF sampling has been used for scalable Boltzmann-like exploration: a vMF sample is drawn from a state embedding and the nearest neighbor action is selected, yielding efficient, theoretically grounded, and scalable exploration behavior (Bendada et al., 1 Jul 2025).

3.4. Policy Improvement via Gradient Directional Uncertainty

Recent developments measure gradient disagreement among ensemble critics in actor–critic RL by fitting a vMF to the set of normalized gradients, using the resultant length (or its associated concentration parameter) as a measure of uncertainty and basing resampling priorities accordingly (the vMFER method) (Zhu et al., 14 May 2024).

3.5. Quantum Algorithms and Bayesian Inference

In the design of Bayesian variational quantum algorithms, the vMF distribution provides a natural prior/posterior over normalized quantum state vectors. Closed-form updates (for the mean and concentration parameters) are available using moment formulas derived from vMF properties, yielding efficient sampling and update schemes on spherical domains (Huynh et al., 4 Oct 2024).

4. Algorithmic, Numerical, and Implementation Considerations

Efficient vMF sampling requires careful management of the normalization constant and the Bessel function evaluations. Established routines:

Use high-precision libraries (e.g., mpmath for Bessel evaluations (Kim, 2021)).
Employ approximations or Taylor expansions to avoid numerical collapse for extreme parameter values (Wei et al., 2015).
Leverage fast envelope/acceptance-rejection schemes (truncated gamma or mixture proposals) to achieve high sampling throughput (Forbes et al., 2014).

In deep learning and variational inference, the lack of analytic inverse CDFs and the reliance on rejection sampling make the vMF distribution difficult to use with reparameterization trick. Alternative distributions such as the Power Spherical, which have closed-form inverse CDFs via transformed Beta marginals, provide a robust replacement (no-rejection, stable gradients) (Cao et al., 2020).

Although advanced vMF sampling algorithms provide high efficiency, key limitations arise for extreme parameter regimes (ultra-high concentration or dimension), where Bessel function evaluations may become unstable or mixture modeling is required to accurately capture multi-modality (e.g., in ring systems or with torsion angle mixtures (Swanson et al., 2023)).

Extensions such as the spherical Laplace (You et al., 2022) and generalized vMF (Leonenko et al., 2020) distributions have been proposed to provide enhanced robustness (heavy tails, median-based central tendency), particularly in the presence of noise or outlier contamination.

For some applied contexts (e.g., wireless and radar), closed-form sampling and correlation results tailored to vMF scattering environments have been derived, directly connecting angular concentration to empirical phenomena such as Doppler broadening or spatial/temporal coherence (Turbic et al., 3 Sep 2024, Turbic et al., 3 Sep 2024).

6. Summary Table: Core Aspects of von Mises-Fisher Sampling

Aspect	Key Method / Formula	Citation(s)
1D Posterior	Shifted gamma rejection w/ Lambert W	(Forbes et al., 2014)
High-D Sampling	Wood's algorithm, asymptotics, mixture EM	(Kasarapu et al., 2015, Wei et al., 2015)
Reinforcement Learning	vMF-based exploration and resampling	(Bendada et al., 1 Jul 2025, Zhu et al., 14 May 2024)
Deep Learning	Mixtures, PyTorch routines, MML estimation	(Kim, 2021, Kasarapu et al., 2015)
Molecular Modeling	Mixture vMF for torsion/bond angles	(Swanson et al., 2023)

In summary, von Mises-Fisher sampling is central to modern probabilistic modeling on the sphere. Algorithmic advances—encompassing envelope-based acceptance-rejection, mixture modeling, and high-dimensional approximations—have enabled precise, stable, and scalable sampling, allowing vMF models to serve as practical tools for high-dimensional statistics, machine learning, physical sciences, quantum computing, and sequential decision-making across diverse scientific domains.