Stationary Density Ratio

Updated 1 January 2026

Stationary density ratio is a metric that compares steady-state distributions, enabling reweighting and correction across various physical and statistical models.
It underpins applications in off-policy reinforcement learning, multiphase flow simulation, mass transport modeling, and impurity diagnostics in plasma systems.
Advanced algorithms leverage density ratio estimation to reduce bias and enhance convergence through techniques like marginalized importance sampling and variance reduction.

A stationary density ratio quantifies the relative prevalence of a physical or statistical quantity between two reference distributions, regimes, or boundary locations under time-invariant (steady-state) conditions. Its concrete definitions and operational roles differ substantially depending on the context: reinforcement learning (density ratio correction), multiphase flow models (liquid-vapor coexistence), mass transport in inhomogeneous media, and stationary profile diagnostics in plasma turbulence. Across domains, stationary density ratio constructs serve to reweight, normalize, or contrast density values associated with steady-state distributions.

1. Stationary Density Ratio in Off-Policy Reinforcement Learning

In the setting of infinite-horizon discounted Markov Decision Processes (MDP), the stationary density ratio $w^\pi(s,a)$ encapsulates the normalized difference between the discounted occupancy of state-action pair $(s,a)$ under a target policy $\pi$ and that under a behavior distribution $\mu$ from which samples are drawn. Specifically,

$w^\pi(s,a) := \frac{d^\pi(s,a)}{\mu(s,a)}$

where $d^\pi(s,a) = (1-\gamma) \mathbb{E}_{\tau \sim \pi, s_0 \sim \nu_0}\left[ \sum_{t=0}^\infty \gamma^t \mathbb{I}\{(s_t = s, a_t = a)\} \right]$ is the normalized discounted occupancy, satisfying $\sum_{s,a} d^\pi(s,a) = 1$ (Huang et al., 2021). Here, $w^\pi$ enables sample-based off-policy optimization via "marginalized importance sampling," correcting for deviations between the underlying data and the desired evaluation policy distribution.

In practical algorithms, function classes $w_\zeta(s,a)$ parameterize approximations to $w^\pi(s,a)$ with regularization to control bias and stabilize optimization. The density ratio enters the off-policy loss:

$L^D(\pi_\theta, \zeta, \xi) = (1-\gamma) \mathbb{E}_{s_0 \sim \nu_D}[Q_\xi(s_0, \pi_\theta)] + \mathbb{E}_{(s,a,r,s') \sim D}[w_\zeta(s,a) (r + \gamma Q_\xi(s',\pi_\theta) - Q_\xi(s,a))] + \cdots$

This promotes unbiased evaluation of Bellman residuals under the stationary distribution of $\pi$ , thereby enabling stable and convergent off-policy policy optimization (Huang et al., 2021).

2. Density Ratio in Multiphase Lattice Boltzmann Models

In computational multiphase flow models utilizing the lattice Boltzmann (LB) method, the stationary density ratio refers to the equilibrium ratio of liquid- to vapor-phase macroscopic densities ( $\rho_\ell/\rho_g$ ) achieved by the system. Li et al. (Li et al., 2012) demonstrate that, under a refined multiple-relaxation-time (MRT) pseudopotential framework and improved forcing, the stationary density ratio can reach values $\rho_\ell/\rho_g \sim O(10^3)$ . The profile and stationarity of the density ratio are benchmarked by initializing a droplet with a tanh-profile in density, running the simulation to steady-state, and directly measuring $\rho_\ell$ and $\rho_g$ .

Thermodynamic consistency is enforced by tuning mechanical-balance coexistence in the pressure tensor, and the equilibrium density ratio is validated by comparing simulation results with the Maxwell equal-area construction:

$\int_{\rho_g}^{\rho_\ell} [p_0 - p_{\text{EOS}}(\rho)]\, d\rho = 0$

Deviation from theoretical coexistence is kept below $1\%$ , and the larger stationary density ratio is shown to correlate with reduced spurious currents and improved numerical stability (Li et al., 2012).

3. Stationary Density Ratio in Mass Transport of Inhomogeneous Media

For mass transport in two-component lattice gases with excluded volume constraints, the stationary density ratio is defined as

$R \equiv \frac{n_m(L)}{n_m(0)}$

where $n_m(x)$ is the stationary density of a mobile component at boundary locations $x = 0$ and $x = L$ , and the second component is static at density $n_s(x)$ (Lukyanets et al., 2010). Remarkably, at steady-state,

$R = \frac{1 - n_s(L)}{1 - n_s(0)}$

so $R$ is entirely dictated by boundary values of the static profile, independent of mass flux $j$ , diffusion coefficient $D$ , or internal spatial structure. This stationary ratio encapsulates microscopic interactions–the excluded-volume effect and distinguishability–that can yield inverted ("uphill") transport, where mobile particles are moved from regions of lower to higher density.

4. Stationary Density Ratio in Gyrokinetic Plasma Modeling

In turbulence-driven transport studies within magnetically confined plasmas, stationary density ratios describe the relative profile steepness ("peaking") of different species under zero-flux conditions. The stationary gradient (peaking factor) for species $j$ is

$PF_j = -\frac{D_{T_j}(R/L_{T_j}) + R V_{p,j}}{D_j}$

where $D_{T_j}$ is the thermodiffusion coefficient, $V_{p,j}$ is the convective pinch, and $D_j$ is the diffusion coefficient (Skyman et al., 2014). When two species, such as electrons and a trace impurity, have stationary gradients $PF_e$ and $PF_Z$ , the stationary density ratio for center-localized densities is

$\left.\frac{n_Z}{n_e}\right|_{r=a} \approx \exp(PF_Z - PF_e)$

and the ratio of gradients $PF_Z / PF_e$ quantifies relative profile peaking. This "stationary density ratio" is used to diagnose impurity accumulation, fuel separation, and the parameter dependence of turbulent transport in tokamaks (Skyman et al., 2014).

5. Algorithms and Bias Correction via Density Ratio

Density ratio correction is central to off-policy policy optimization algorithms such as P-SREDA and O-SPIM (Huang et al., 2021). These methods jointly optimize three sets of parameters: policy $\theta$ , density-ratio approximator $\zeta$ , and action-value approximator $\xi$ , within a max-max-min framework,

$\max_{\theta \in \Theta} \max_{\zeta \in Z} \min_{\xi \in \Xi} L^D(\theta, \zeta, \xi)$

Variance-reduction and momentum-based techniques (SARAH-style updates, oracle subroutines with contraction) enhance convergence speed and sample efficiency. Bias in the stationary objective arises from (i) data estimation error, (ii) function class misspecification, (iii) regularization, yielding total bias $O(\text{Reg}(\lambda_w,\lambda_Q) + \text{Func}(mismatch) + \text{Data}(generalization))$ (Huang et al., 2021). Stationary density-ratio estimation ensures that Bellman residual minimization is unbiased with respect to the target policy’s stationary distribution.

6. Parametric Dependence and Physical Significance

In each domain, stationary density ratios exhibit characteristic parametric dependence:

In multiphase LB models, interface thickness is inversely proportional to $\sqrt{a}$ in the Carnahan–Starling EOS, and stationary density ratio increases as $a$ decreases, correlating with reduced spurious currents (Li et al., 2012).
In lattice-gas mass transport, $R$ depends only on static boundary densities and can manifest inverted ("uphill") transport entirely via excluded-volume effects (Lukyanets et al., 2010).
In gyrokinetic models, stationary density ratio for impurity-to-electron profiles scales with magnetic shear, temperature ratio, collisionality, and geometry; isotope effects induce fuel separation (Skyman et al., 2014).
In reinforcement learning, the stationary density ratio $w^\pi$ is bounded by concentrability constants and directly modulates the convergence and bias properties of the off-policy optimization objective (Huang et al., 2021).

A plausible implication is that across disparate fields, the stationary density ratio provides a unifying metric for steady-state comparison and physical validation, whether for probability measures, mass densities, or particle distributions. Its correct estimation and deployment are critical for unbiased physical modeling, convergence analysis, and transport characterization.