Papers
Topics
Authors
Recent
Search
2000 character limit reached

Submodular Information Measures (SIM)

Updated 21 April 2026
  • Submodular Information Measures (SIM) are a combinatorial generalization of classical information metrics that replace Shannon entropy with a monotone submodular set function.
  • SIMs offer strong theoretical guarantees with efficient greedy algorithms that achieve near-optimal (1-1/e) approximations for tasks like summarization and active learning.
  • Applications of SIMs span active learning, privacy filtering, causal inference, and representation learning, often yielding significant performance gains in empirical studies.

A submodular information measure (SIM) is a combinatorial generalization of classical information-theoretic measures, such as entropy, mutual information, and conditional mutual information, in which the foundational role of Shannon entropy is replaced with a general monotone submodular set function. SIMs provide an abstract algebraic framework for modeling information, relevance, coverage, independence, and diversity on arbitrary ground sets, extending beyond random variables to structured data, feature sets, and combinatorial objects. SIMs have deep implications across data subset selection, active learning, summarization, privacy, representation learning, causal inference, and extremal combinatorics.

1. Formal Definitions and Mathematical Structure

Let VV be a finite ground set and f:2VRf: 2^V \to \mathbb{R} a normalized, monotone, submodular set function: f()=0f(\emptyset) = 0, f(A)f(B)f(A) \leq f(B) whenever ABA \subseteq B, and for all A,BVA, B \subseteq V, f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B).

The key submodular information measures are:

f(AP):=f(AP)f(P)f(A \mid P) := f(A \cup P) - f(P)

Intuition: The incremental "utility" provided by AA beyond PP.

f:2VRf: 2^V \to \mathbb{R}0

Intuition: The amount of “shared information” or representativeness of f:2VRf: 2^V \to \mathbb{R}1 with respect to f:2VRf: 2^V \to \mathbb{R}2.

f:2VRf: 2^V \to \mathbb{R}3

Equivalently, f:2VRf: 2^V \to \mathbb{R}4. Intuition: Relevance of f:2VRf: 2^V \to \mathbb{R}5 to f:2VRf: 2^V \to \mathbb{R}6 penalized by overlap with f:2VRf: 2^V \to \mathbb{R}7.

These extend immediately to multi-set analogues, total correlation, and composite objectives. For example, the total correlation of f:2VRf: 2^V \to \mathbb{R}8 disjoint sets is f:2VRf: 2^V \to \mathbb{R}9 (Majee et al., 2023).

When f()=0f(\emptyset) = 00 is the entropy of a collection of random variables, these recover classical Shannon-information measures. For canonical submodular functions like coverage, facility-location, concave-over-modular, or certain graph-cut-type objectives, SIMs coincide exactly with entropic mutual information under explicit constructions (Iyer, 19 Jan 2026).

2. Theoretical Properties: Axioms and Independence

SIMs inherit critical properties from submodularity (Asnani et al., 2021, Iyer et al., 2020):

  • Nonnegativity: f()=0f(\emptyset) = 01 and f()=0f(\emptyset) = 02 for normalized, monotone f()=0f(\emptyset) = 03.
  • Symmetry: f()=0f(\emptyset) = 04.
  • Monotonicity: f()=0f(\emptyset) = 05 is non-decreasing for fixed f()=0f(\emptyset) = 06; f()=0f(\emptyset) = 07 is monotone in f()=0f(\emptyset) = 08.
  • Submodularity in One Argument: f()=0f(\emptyset) = 09 is submodular in f(A)f(B)f(A) \leq f(B)0 when f(A)f(B)f(A) \leq f(B)1's third discrete derivatives are non-negative; this holds for facility-location, set cover, concave-over-modular, and some graph-cut functions (Iyer et al., 2020, Kothawade et al., 2021).
  • Chain Rule: f(A)f(B)f(A) \leq f(B)2.

Independence concepts are generalized:

  • Joint Independence: f(A)f(B)f(A) \leq f(B)3.
  • Pairwise Independence: f(A)f(B)f(A) \leq f(B)4 if for all f(A)f(B)f(A) \leq f(B)5, f(A)f(B)f(A) \leq f(B)6.
  • Multi-set Independence: f(A)f(B)f(A) \leq f(B)7 (Asnani et al., 2021).

These fundamental axioms enable the use of SIMs in combinatorial optimization, privacy, summarization, and learning tasks that require formal guarantees.

3. Canonical Submodular Function Classes and Entropic Correspondence

The most widely used SIMs are grounded in the following classes (Iyer, 19 Jan 2026, Iyer et al., 2020):

Function family f(A)f(B)f(A) \leq f(B)8 definition Typical use cases
Coverage/set-cover f(A)f(B)f(A) \leq f(B)9 Diversity, coverage
Facility-location ABA \subseteq B0 Representation, information overlap
Graph-cut-type ABA \subseteq B1 Redundancy, separation, clustering
Concave-over-mod ABA \subseteq B2, ABA \subseteq B3 concave nondecreasing Robustness, budgeted diversity
Log-determinant ABA \subseteq B4 (ABA \subseteq B5 kernel) Volume, diversity, uncertainty

Recent work demonstrates exact entropic constructions: given any of these ABA \subseteq B6, there exists a random vector ABA \subseteq B7 so that ABA \subseteq B8, and all submodular information measures reduce to their classical Shannon counterparts (Iyer, 19 Jan 2026).

4. Optimization Algorithms and Greedy Guarantees

Maximization of any nonnegative, monotone SIM (e.g., ABA \subseteq B9, A,BVA, B \subseteq V0, A,BVA, B \subseteq V1) under a cardinality or matroid constraint admits a A,BVA, B \subseteq V2-approximation via the greedy algorithm (Kothawade et al., 2022, Kothawade et al., 2021, Kothawade et al., 2021):

  1. Initialize A,BVA, B \subseteq V3.
  2. For A,BVA, B \subseteq V4 to A,BVA, B \subseteq V5:
    • For each A,BVA, B \subseteq V6 not in A,BVA, B \subseteq V7, compute marginal gain: e.g., A,BVA, B \subseteq V8.
    • Add A,BVA, B \subseteq V9 with maximal f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)0 to f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)1.

Lazy-greedy and partitioning reduce computational cost, particularly for SMI based on facility-location (requiring only f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)2 similarity evaluations) (Kothawade et al., 2022, Kothawade et al., 2021).

Curvature bounds ([curvature f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)3]) further tighten approximation ratios. In practice, facility-location, graph-cut, and log-determinant functions exhibit low curvature, making greedy nearly optimal (Kothawade et al., 2022).

5. Applications: Data Selection, Summarization, and Learning

SIMs constitute core objectives in broad machine learning settings:

  • Active Learning and Data Discovery: SCG and SMI are used to mine rare or unknown classes by rewarding dissimilarity from labeled sets (SCG) and then intensifying discovery by targeting known hits (SMI/SCMI). Empirically, these approaches dominate baselines on rare-class and OOD selection in image classification and object detection, with 10–15% absolute gains in accuracy for unknowns (Kothawade et al., 2022, Kothawade et al., 2021, Kothawade et al., 2022).
  • Targeted Subset Selection: SMI and variants (facility-location, log-det, graph-cut, COM) select samples that optimally trade off query relevance and target coverage. Theoretical bounds guarantee that maximizing SMI under realistic similarity-separation assumptions ensures high query relevance and coverage (Beck et al., 2024, Kothawade et al., 2021).
  • Privacy and Fairness: SCMI and its constraints operationalize privacy by enforcing independence from a sensitive set under a user-defined threshold (Asnani et al., 2021, Kaushal et al., 2020). Privacy filters and marginal-independence filters compose efficiently with submodular maximization objectives.
  • Summarization and Representation Learning: SIMs unify generic, query-focused, privacy- and update-aware summarization as direct maximizations of SMI, SCG, or CSMI, generalizing models such as ROUGE, DPPs, and graph-cut methods (Kaushal et al., 2020, Kothawade et al., 2021). In representation learning, submodular total correlation losses (e.g., SCoRe framework) simultaneously minimize intra-class variance and inter-class bias, outperforming standard contrastive methods for imbalanced data (Majee et al., 2023).

A selection of empirical results:

Application SIM Instantiation Typical Gain over Baselines Reference
Active Data Discovery Fl_cg+mi, Logdet_cg+mi 10–15% higher accuracy on unknowns (Kothawade et al., 2022)
OOD Avoidance Fl-CMI, LogDet-CMI 4–7% accuracy lift (Kothawade et al., 2022)
Targeted TSS LogdetMI, FL2MI ~20–30% absolute improvement on rare classes (Kothawade et al., 2021)
Summarization FL-SMI, GraphCut-SMI, LogDet-SMI Near human-level V-ROUGE (Kaushal et al., 2020)
Representation Learning FL-/GC -C_f 1–9% boost in class-imblanced recognition (Majee et al., 2023)

6. Extensions: Causal Inference, Information Inequalities, and Advanced Properties

SIMs extend classical independence, conditional independence, and causal Markov properties to non-entropic settings, unifying information-theoretic and combinatorial perspectives. The generalized causal Markov condition for SIMs matches the standard DAG-based independence structure, independent of the choice of submodular f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)4 (Steudel et al., 2010).

Unified derivations of information inequalities (Han’s, Shearer’s, monotonicity sequences, total correlation bounds) follow broadly from submodularity. These yield refined combinatorial bounds, e.g., on projection sizes, Boolean influences, and extremal graph properties (Sason, 2022).

Recent developments include the study of SIMs for weak submodularity in quadratic estimation and optimal experimental design (alphabetic optimality criteria), where closed-form utility functions (log-det, trace, min-eigenvalue) are directly submodular or enjoy quantifiable approximation via greedy (Hashemi et al., 2019).

7. Modeling Flexibility, Parameterizations, and Practical Considerations

Modern extensions, such as PRISM (Kothawade et al., 2021), introduce multi-parameterized SIMs to interpolate between relevance, diversity, privacy, and coverage. Typical parameters:

  • f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)5 (graph-cut): relevance vs. diversity
  • f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)6 (facility-location, COM): similarity-to-query trade-off
  • f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)7 (conditional gain): strength of avoidance/penalty to a private set

By tuning these, SIMs adapt to a wide regime of problems: rare-class mining, guided summarization, OOD filtering, distributed and scalable optimization.

Several concrete choices are supported with efficient greedy algorithms (Kothawade et al., 2021, Kaushal et al., 2020, Beck et al., 2024), and the entire framework is modality-agnostic—applicable to images, video, text, sensor sets, and gradient embeddings.


Summary Table of Core SIM Formulae

Name Formula Typical Use
SCG f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)8 Dissimilarity, novelty
SMI f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)9 Relevance, coverage, overlap
SCMI f(AP):=f(AP)f(P)f(A \mid P) := f(A \cup P) - f(P)0 Targeting under exclusion

Through their algebraic generality and foundational approximation guarantees, submodular information measures constitute a principled, tractable, and highly expressive toolkit for information-centric decision-making in structured data systems (Kothawade et al., 2022, Asnani et al., 2021, Iyer et al., 2020, Beck et al., 2024, Kothawade et al., 2021).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Submodular Information Measures (SIM).