Papers
Topics
Authors
Recent
Search
2000 character limit reached

Jensen Gap in Convex Analysis

Updated 25 April 2026
  • Jensen Gap is a quantitative measure of the deviation in Jensen’s inequality, defined as the difference between the mean of a convex function and the function of the mean.
  • It generalizes to operator, quantum, and algorithmic contexts, providing explicit variance-based bounds and bridging classical analysis with noncommutative extensions.
  • In machine learning and fairness optimization, the Jensen Gap is used as a regularizer to control model complexity and align empirical loss with theoretical guarantees.

The Jensen gap is the quantitative measure of the nonlinearity discrepancy in Jensen’s inequality, defined for a convex function ff and a weighted sample or a random variable as the difference between the mean of ff and ff of the mean. The concept generalizes across classical, operator, quantum, and algorithmic contexts, providing a key analytic tool in convex analysis, probability, information theory, matrix analysis, machine learning, and optimization.

1. Mathematical Foundations and Classical Formulation

Given a convex set IRI\subseteq\mathbb{R} and a convex function f:IRf:I\to\mathbb{R}, the classical Jensen gap (also termed Jensen divergence) for a finite probability-weighted sample {(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m (with pi0p_i\geq 0, i=1mpi=1\sum_{i=1}^m p_i = 1, xiIx_i\in I), is

Jf(p;x)=i=1mpif(xi)f(i=1mpixi)J_f(p; x) = \sum_{i=1}^m p_i f(x_i) - f\Bigl(\sum_{i=1}^m p_i x_i\Bigr)

This is always nonnegative by Jensen's inequality and vanishes if all ff0 are equal or if ff1 is affine on ff2. In the two-point case (ff3, ff4, ff5), ff6 (Virosztek, 2017).

Alternative formulations appear in probabilistic frameworks, where for a random variable ff7 with law ff8 and mean ff9, the Jensen gap is ff0 (Gao et al., 2017). For Borel measures ff1 on ff2, ff3 where ff4 is the barycenter (Niculescu et al., 2012).

2. Generalizations: Operator, Quantum, and Chord Gap Divergence

Operator and Matrix Jensen Gap

The operator version extends to self-adjoint matrices ff5 and positive unital linear maps ff6, where, for strongly convex ff7, a refined lower bound holds: ff8 with ff9 the modulus of strong convexity (Moradi et al., 2016).

Quantum Jensen Gap and Matrix Entropy Class

In the quantum (matrix) setting, the Jensen gap is defined for IRI\subseteq\mathbb{R}0: IRI\subseteq\mathbb{R}1 Joint convexity of the quantum Jensen divergence is characterized via the Matrix Entropy Class: convex, IRI\subseteq\mathbb{R}2 functions IRI\subseteq\mathbb{R}3 on IRI\subseteq\mathbb{R}4 with IRI\subseteq\mathbb{R}5 such that the inverse Fréchet derivative IRI\subseteq\mathbb{R}6 is operator-concave. Exemplars include IRI\subseteq\mathbb{R}7 and IRI\subseteq\mathbb{R}8 (IRI\subseteq\mathbb{R}9), giving rise to quantum Jensen–Shannon and power divergences, with important monotonicity and metric properties (Virosztek, 2017).

Chord Gap Divergence

A three-parameter generalization, the chord gap divergence f:IRf:I\to\mathbb{R}0, interpolates between Jensen divergences computed on subsegments, admitting “difference-of-Jensens” representations and a quadratic Taylor–Lagrange form. It subsumes the Jensen divergence as a special case and links to generalized Bhattacharyya distances for exponential families (Nielsen, 2017).

3. Extended Frameworks: Measures, Nonconvexity, and Qualitative Bounds

The Jensen gap persists under considerable generalization:

  • Nonconvex f:IRf:I\to\mathbb{R}1: If f:IRf:I\to\mathbb{R}2 is “mixed convex,” i.e., symmetric about a point f:IRf:I\to\mathbb{R}3 and convex on a subinterval, then f:IRf:I\to\mathbb{R}4 under specified support and barycenter conditions.
  • Signed measures: For Steffensen–Popoviciu measures (which integrate nonnegative convex functions to nonnegative numbers), one recovers sign control f:IRf:I\to\mathbb{R}5.
  • “Almost convex” functions: If f:IRf:I\to\mathbb{R}6 is convex on f:IRf:I\to\mathbb{R}7 and lies above a chord elsewhere, one can still control the sign of the Jensen gap (Niculescu et al., 2012).

In these extensions, the sign of the gap is preserved without requiring global convexity of f:IRf:I\to\mathbb{R}8 or nonnegativity of f:IRf:I\to\mathbb{R}9.

4. Quantitative Evaluation and Moment-Driven Bounds

Explicit quantitative bounds can be given in terms of moments of the distribution of {(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m0:

  • Upper bound: For {(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m1 with {(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m2 near {(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m3 and {(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m4 at infinity,

{(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m5

where {(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m6, and {(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m7 is a constant determined by the growth of {(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m8 (Gao et al., 2017).

  • Lower bound: For strictly convex {(pi,xi)}i=1m\{(p_i, x_i)\}_{i=1}^m9 with suitable local and tail growth,

pi0p_i\geq 00

These bounds are sharp with respect to the exponents. For convex pi0p_i\geq 01 with pi0p_i\geq 02, the gap is pi0p_i\geq 03.

Such bounds have wide implications for mean-concentrated distributions, error control in Monte-Carlo schemes, and statistical mechanics (e.g., bias in empirical means, Jarzynski-type inequalities) (Gao et al., 2017).

5. Algorithmic and Statistical Machine Learning Implications

The Jensen gap increasingly appears in machine learning as an explicit regularizer or measure of solution bias:

  • Symbolic Regression and Overfitting: In evolutionary feature construction with empirical risk minimization, vicinal risk decomposition (e.g., via mixup) bounds the risk by the sum of empirical loss and a mean squared vicinal Jensen gap. Penalizing this gap enforces local linearity and controls model complexity more directly than parsimony or VC-dimension, empirically reducing overfitting in genetic programming. The regularization coefficient can be adaptively scaled via noise estimation; manifold intrusion from mixup is detected using reference regressors (Zhang et al., 2 Feb 2026).
  • Group Fairness in Recommender Systems: In max–min group fairness optimization, nonlinearity of the loss under stochastic mini-batch methods creates a Jensen gap between the true constrained and mini-batch objectives. This gap grows as batch size decreases or the number of groups increases. Dual-weighted reweighting (the FairDual algorithm) provably bridges the gap, yielding sublinear convergence and improved utility/fairness trade-offs over strong DRO and reweighting baselines (Xu et al., 13 Feb 2025).

6. Operator Inequalities and Noncommutative Extensions

In operator theory, the Jensen gap governs the noncommutative analogues of classical moment inequalities. For pi0p_i\geq 04 strongly convex on pi0p_i\geq 05, pi0p_i\geq 06 self-adjoint, and unital positive pi0p_i\geq 07, the refined Jensen operator inequality gives a lower bound of the gap by the variance: pi0p_i\geq 08 This provides explicit quantitative improvements to classical results such as the Hölder–McCarthy inequality, as well as new quantitative control over matrix means, monotone functions, and quantum measurement. In the scalar case, this specializes to pi0p_i\geq 09 when i=1mpi=1\sum_{i=1}^m p_i = 10 is i=1mpi=1\sum_{i=1}^m p_i = 11-strongly convex (Moradi et al., 2016).

7. Connections and Generalizations

The Jensen gap is foundational in the theory of convexity, divergence measures, and their extensions. Its generalizations—such as the chord gap divergence—unify parameter and distributional distances (Burbea–Rao, Bhattacharyya), provide tractable centroid computation via CCCP, and support competitive guarantees in i=1mpi=1\sum_{i=1}^m p_i = 12-means++ clustering (Nielsen, 2017).

A table summarizing select instances and theoretical properties:

Context Definition/Formula Key Property/Reference
Classical i=1mpi=1\sum_{i=1}^m p_i = 13 Nonnegativity, joint convexity (Virosztek, 2017)
Operator/Matrix i=1mpi=1\sum_{i=1}^m p_i = 14 Variance lower bound (Moradi et al., 2016)
Quantum i=1mpi=1\sum_{i=1}^m p_i = 15 as above Joint convexity i=1mpi=1\sum_{i=1}^m p_i = 16 matrix entropy class (Virosztek, 2017)
Machine learning / mixup i=1mpi=1\sum_{i=1}^m p_i = 17 Smoothness/complexity regularization (Zhang et al., 2 Feb 2026)
Fairness optimization i=1mpi=1\sum_{i=1}^m p_i = 18 Grows as batch i=1mpi=1\sum_{i=1}^m p_i = 19, groups xiIx_i\in I0 (Xu et al., 13 Feb 2025)

The Jensen gap thus operates as both a theoretical tool and a practical regularizer in diverse mathematical and algorithmic settings, characterizing the degree of nonlinearity, guiding the design of inequalities, and supplying systematic corrections and controls in analysis and machine learning.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Jensen Gap.