Papers
Topics
Authors
Recent
Search
2000 character limit reached

Submodular Mutual Information (SMI)

Updated 21 April 2026
  • Submodular Mutual Information (SMI) is defined as f(A)+f(Q)-f(A∪Q), generalizing Shannon’s mutual information using the diminishing returns property of submodular functions.
  • Its mathematical properties such as non-negativity, symmetry, and monotonicity enable near-optimal greedy maximization for data selection and optimization tasks.
  • SMI is widely applied in targeted data subset selection, active learning, summarization, sensor placement, and multivariate information theory to balance relevance and diversity.

Submodular Mutual Information (SMI) is a combinatorial generalization of Shannon mutual information that emerges from the theory of submodular set functions, providing a mathematically rigorous and algorithmically tractable framework for quantifying the shared information content between sets of objects. SMI inherits the diminishing-returns property of submodular functions and is widely used to optimize data subset selection, active learning, summarization, and related tasks across machine learning and information theory. The formal structure and performance guarantees of SMI enable robust selection strategies balancing diversity, coverage, and query relevance.

1. Formal Definition and Mathematical Structure

Let f ⁣:2VRf\colon 2^V \to \mathbb{R} be a normalized, monotone, submodular set function over a ground set VV, i.e., f()=0f(\emptyset) = 0, f(A)f(B)f(A) \leq f(B) for ABA \subseteq B, and for any ABVA \subseteq B \subseteq V, xBx \notin B: f(A{x})f(A)f(B{x})f(B)f(A \cup \{x\}) - f(A) \geq f(B \cup \{x\}) - f(B) This is the diminishing-returns property. Given two subsets A,QVA, Q \subseteq V, the Submodular Mutual Information is

If(A;Q)=f(A)+f(Q)f(AQ)I_f(A; Q) = f(A) + f(Q) - f(A \cup Q)

SMI quantifies the overlap in "information" between VV0 and VV1 in the sense defined by VV2, generalizing Shannon mutual information when VV3 is an entropy function. For conditional SMI, given an additional "private" set VV4, the conditional form is

VV5

(Iyer et al., 2020, Kaushal et al., 2020, Beck et al., 2024, Kothawade et al., 2022)

2. Theoretical Properties

SMI retains several crucial properties under mild conditions:

  • Non-negativity: VV6 if VV7 is submodular. Equality holds if and only if VV8 and VV9 are "independent" with respect to f()=0f(\emptyset) = 00, e.g., for entropy, if the random variables are mutually independent.
  • Symmetry: f()=0f(\emptyset) = 01.
  • Monotonicity: For fixed f()=0f(\emptyset) = 02, f()=0f(\emptyset) = 03 is monotone non-decreasing in f()=0f(\emptyset) = 04.
  • Submodularity in One Argument: For a wide class of base functions f()=0f(\emptyset) = 05 (including facility location, set-cover, and graph-cut), f()=0f(\emptyset) = 06 is also submodular. Sufficient conditions involve non-negativity of certain higher-order discrete derivatives of f()=0f(\emptyset) = 07 (Iyer et al., 2020).
  • Bounds: f()=0f(\emptyset) = 08.
  • Approximation Guarantee: Maximizing monotone submodular SMI under a cardinality constraint yields a f()=0f(\emptyset) = 09 approximation factor via the greedy algorithm (Nemhauser et al., 1978).

(Iyer et al., 2020, Kaushal et al., 2020, Beck et al., 2024, Beck et al., 2024, Li et al., 2022, Kothawade et al., 2021)

3. Instantiations and Closed-Form Variants

SMI admits concrete, efficient forms for many popular submodular functions:

SMI Variant Base Function f(A)f(B)f(A) \leq f(B)0 / Formula SMI Expression (Simplified)
Facility-Location f(A)f(B)f(A) \leq f(B)1 f(A)f(B)f(A) \leq f(B)2
Graph-Cut f(A)f(B)f(A) \leq f(B)3 f(A)f(B)f(A) \leq f(B)4
Log-Determinant f(A)f(B)f(A) \leq f(B)5 (for PSD kernel f(A)f(B)f(A) \leq f(B)6, f(A)f(B)f(A) \leq f(B)7) f(A)f(B)f(A) \leq f(B)8
Prob. Set Cover f(A)f(B)f(A) \leq f(B)9 ABA \subseteq B0

These variants enable direct encoding of coverage, relevance, and diversity in various application domains. For multivariate generalizations, SMI further encompasses:

4. Algorithmic Optimization

Greedy maximization is provably near-optimal for monotone submodular SMI variants. At each step, the element yielding the highest marginal gain is added: ABA \subseteq B4 Techniques improving efficiency include:

(Beck et al., 2024, Kothawade et al., 2021, Kothawade et al., 2021, Iyer et al., 2020)

5. Applications in Machine Learning and Information Theory

SMI is foundational in multiple domains:

  • Targeted Data Subset Selection: Selecting unlabeled samples maximally "mutually informative" with a query/exemplar set boosts rare-class and overall performance in both vision and language (Kothawade et al., 2021, Beck et al., 2024).
  • Active Learning: SMI-guided acquisition functions outperform uncertainty/diversity heuristics, especially under class imbalance, rare slices, or OOD data (Kothawade et al., 2021, Kothawade et al., 2021, Kothawade et al., 2022).
  • Summarization: Query-focused, privacy-preserving, and update data summarization are unified under SMI with explicit, interpretable objectives (Kaushal et al., 2020, Iyer et al., 2020).
  • Meta-Learning and Semi-Supervision: In episodic meta-learning, per-class SMI acquisition promotes balanced pseudo-labeling, resilience to OOD, and robust adaptation (Li et al., 2022).
  • In-context Retrieval and Ranking: Jointly maximizing query relevance and exemplar diversity via SMI yields state-of-the-art in-context retrieval across question-answering and NLU tasks (Nanda et al., 28 Aug 2025).
  • Sensor Placement: For Gaussian sources with additive noise, classical mutual information is submodular, rendering greedy sensor selection near-optimal (Crowley et al., 2024).
  • Multivariate Information Theory: SMI with fractional partitions unifies total correlation, dual total correlation, and shared information, linking combinatorial inequalities, entropic inequalities, and matrix analytic results (Jakhar et al., 21 Jan 2025).

(Kothawade et al., 2021, Beck et al., 2024, Kaushal et al., 2020, Kothawade et al., 2021, Kothawade et al., 2021, Li et al., 2022, Kothawade et al., 2022, Beck et al., 2024, Jakhar et al., 21 Jan 2025, Nanda et al., 28 Aug 2025, Crowley et al., 2024)

6. Theoretical Guarantees: Sensitivity to Coverage and Relevance

Explicit similarity-based performance bounds tie SMI scores to actionable metrics:

  • Query Relevance (ABA \subseteq B8): Lower and upper bounds for the number of true targets selected are linear in ABA \subseteq B9 for several variants (e.g., FLVMI, GCMI), under mild assumptions on similarity distributions.
  • Query Coverage (ABVA \subseteq B \subseteq V0): Similar bounds hold for average coverage of remaining targeted points or queries, showing either guaranteed query coverage, high relevance, or an explicit trade-off governed by SMI parameters.
  • Sensitivity Trade-off: Facility-Location SMI is highly sensitive to coverage but less to relevance; Graph-Cut SMI is the converse. Facility-Location Query Mutual Information (FLQMI) and Concave-Over-Modular interpolate between these extremes. Adjusting parameter ABVA \subseteq B \subseteq V1 can trade off between the two (Beck et al., 2024).
  • Tightness: As the separation between targeted and untargeted similarity increases, bounds on relevance and coverage become tight, explaining SMI's empirical performance.

(Beck et al., 2024)

7. Multivariate and Fractional SMI: Generalizations and Deep Connections

Fractional SMI, defined for any fractional partition ABVA \subseteq B \subseteq V2 with ABVA \subseteq B \subseteq V3, is given by

ABVA \subseteq B \subseteq V4

This framework unifies total correlation, dual total correlation, and shared information. Key properties include:

  • Non-negativity: Vanishes if and only if the ground variables are independent.
  • Maximum is Total Correlation: Among all fractional partitions, the singleton (total correlation) is maximal.
  • Data Processing and Chain Rule: Satisfies strong multivariate data processing and recursion relations.
  • Determinantal Inequalities: Fractional SMI recovers matrix inequalities (e.g., Hadamard–Fischer–Szász) for positive definite kernels when applied to log-determinant functions (Jakhar et al., 21 Jan 2025).

(Jakhar et al., 21 Jan 2025)


In summary, Submodular Mutual Information forms a rigorous backbone for a diverse array of data selection tasks, interpolation between coverage and relevance objectives, and generalization of classical information measures. Its theoretical underpinnings and practical instantiations yield computationally efficient, provably good selection mechanisms with broad application in modern data-centric machine learning pipelines.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Submodular Mutual Information (SMI).