Fair Max-Min Subset Selection
- Fair Max-Min Subset Selection is a framework that selects a subset of items with group fairness constraints to maximize the minimum pairwise quality measure.
- Its instantiations, including fair diversification, weighted max-min T-joins, and column subset selection, address applications like data summarization and reviewer assignment.
- Algorithmic methods range from greedy and LP relaxation to coreset construction and streaming techniques, offering practical fairness guarantees despite NP-hard challenges.
Fair Max-Min Subset Selection is a central problem in algorithmic fairness and diversity, unifying themes from robust optimization, combinatorial selection, and group-aware resource allocation. The objective is to select a subset of elements from a ground set, subject to group-based fairness constraints, so as to maximize the minimum pairwise “quality” measure (such as distance, matching cost, or matrix reconstruction error). The canonical instantiations—Fair Max-Min Diversification, Weighted Max-Min -Joins, and Fair Column Subset Selection—appear in applications spanning data summarization, multiagent collaboration, reviewer assignment, and robust recommender system design. Prior work has established both the computational hardness and structural similarity of fair max-min problems to matching, covering, and partitioning primitives, motivating a spectrum of exact algorithms, approximation schemes, and scalable heuristics.
1. Formal Problem Definitions and Core Models
The general Fair Max-Min Subset Selection framework consists of the following:
- Metric Fair Max-Min Diversification Given a finite metric space , with elements partitioned into disjoint groups and quotas , select with , for all 0, to maximize
1
The fairness constraint enforces prescribed representation of each group in 2 (Moumoulidou et al., 2020).
- Weighted Max-Min 3-Join For a graph 4 with edge weights 5, the problem is to select an even-sized subset 6 so as to maximize the minimum-weight perfect matching on 7 in the induced metric:
8
with 9 the graph metric (Alipour, 7 Feb 2026).
- Fair Max-Min Column Subset Selection Given 0, with rows split into 1 and 2, select the column subset 3 minimizing the maximum of normalized reconstruction errors,
4
These models unify diversity maximization with explicit fairness, capturing the need for robust, non-redundant, and equitable representation in subset selection.
2. Computational Complexity and Hardness
Fair Max-Min Subset Selection is provably intractable in general:
- NP-hardness:
For diversity maximization with fairness constraints, it is NP-complete to decide if there is a subset of size 5 (with per-group quotas) attaining a given minimum pairwise distance, even in a metric or Euclidean space (Moumoulidou et al., 2020, Kurkure et al., 2024, Matakos et al., 2023). The fair max-min column subset selection problem is NP-hard even for just two groups due to its reduction from exact partition (Matakos et al., 2023).
- Inapproximability:
Unconstrained max-min diversity admits no polynomial-time approximation better than 6 unless P = NP. This hardness carries over to fair variants unless P = NP (Moumoulidou et al., 2020).
- Exponential dependence on parameters:
Exact algorithms scale exponentially in 7 or in the number of groups—impractical for large datasets or quotas beyond single digits (Addanki et al., 2022, Wang et al., 2023).
This computational barrier motivates the use of approximations, coresets, and randomized rounding, as detailed below.
3. Algorithmic Approaches and Approximation Guarantees
The spectrum of algorithms spans exact, approximation, and streaming paradigms, each tailored to structural and fairness requirements.
3.1 Combinatorial and Optimization-Based Methods
- GMM Farthest-First Traversal (Unconstrained/GMM):
A 8-approximation for classic max-min diversity; in the fair variant forms the backbone for later algorithms (Moumoulidou et al., 2020).
- Linear Programming Relaxation and Rounding:
Relax fairness and packing constraints to an LP, solve for a fractional solution, use randomized order rounding to produce an integral subset with diversity at least 9 and quotas met in expectation (Addanki et al., 2022, Kurkure et al., 2024).
- Thresholding and Swap/Flow Algorithms:
For 0, Fair-Swap achieves a 1-approximation in 2 by iteratively swapping to restore fairness. For 3, thresholding on distance and max-flow assignment yields 4-approx. For overlapping groups, the factor is 5 (Moumoulidou et al., 2020).
- Greedy Clustering + Flow ("FairGreedyFlow"):
In 6 time, guarantees perfect fairness and 7-approximation (Addanki et al., 2022).
- ILP Formulation (Exact FMMD-E):
Solves Fair Max-Min Diversification on 8 constraints for datasets up to a few thousand items. For a guessed diameter 9, checks feasibility of a fair independent set subject to all pairwise separation constraints (Wang et al., 2023).
3.2 Advanced Approximation and Scalability Techniques
- Multiplicative-Weight Update (MFD) with Geometric LP:
Bypassing explicit constraints via MWU and efficient range queries (BBD trees), a constant-factor approximation (up to 0) and near-linear time/space in fixed dimension 1 is achievable (Kurkure et al., 2024).
- Coreset Construction:
In Euclidean 2, group-wise farthest-point coresets of size 3 preserve diversity within 4, yielding efficient distributed and streaming implementations (Addanki et al., 2022, Kurkure et al., 2024).
- Streaming and Distributed Algorithms:
Streaming threshold-GMM maintains 5-sized coresets per group; composable coreset protocols aggregate coresets centrally for final selection (Addanki et al., 2022, Kurkure et al., 2024).
- Approximation Tradeoffs Table:
| Setting/Algorithm | Approximation Factor | Fairness Guarantee | Time Complexity |
|---|---|---|---|
| Fair-Swap (6) | 7 | Exact | 8 |
| Fair-Flow (9) | 0 | Exact | 1 |
| FairGreedyFlow | 2 | Exact | 3 |
| MWU/LP [Euclid, 4] | 5 | 6-exp. | 7 |
| FMMD-S (ILP + coreset) | 8 | Lower/upper quotas | 9 |
| Column subset selection | 0-approx (size) | Relative error bound | 1 |
- Column Subset Selection (fair CSS): Leverage-score based greedy yields a 2-approximation on column-count, guaranteeing both groups’ normalized errors are bounded (Matakos et al., 2023). QR-based practical heuristics ensure near-optimal group error, empirically matching vanilla CSS up to 3-4 in final matrix loss.
3.3 Special Cases and Structural Results
- Weighted Max-Min 5-Join:
Greedy farthest-point orderings yield 6-factor upper bounds. A 7-approximation is achievable in 8. For 9-weight graphs, an exact 0 method is available (Alipour, 7 Feb 2026).
- Exact Results in Low Dimensions:
For 1 (Euclidean), dynamic programming solves the fair max-min subset selection exactly in 2 (Addanki et al., 2022).
4. Empirical Findings and Practical Applications
Empirical benchmarks establish that:
- High-quality, robust, and fair subsets are achievable in practice: For real-world reviewer assignment and mutual-aid city pairing, approximation factors between upper and lower bounds are consistently small (1.2–1.5) (Alipour, 7 Feb 2026).
- Algorithms such as MWU-BBD (Kurkure et al., 2024), FairGreedyFlow (Addanki et al., 2022), and FMMD-S (Wang et al., 2023) scale to datasets of millions of points in minutes, outperforming prior methods in both diversity and running time, especially as the group count grows.
Key application domains include:
- Reviewer assignment for conferences (fairly covering expertise subcommunities).
- Coalition formation and robust resource allocation in multiagent systems.
- Dataset summarization, search result diversity, and fair recommender systems.
- Column subset selection in fair, group-sensitive matrix approximation tasks.
Group fairness ensures robust worst-case guarantees, preventing domination or undercoverage of any protected or sensitive group.
5. Theoretical Insights and Open Directions
Theoretical developments include:
- LP relaxation/rounding achieves fairness-in-expectation at optimal 3-approximation but requires strengthening for high-probability guarantees and exact quotas (Addanki et al., 2022).
- Greedy/flow-based algorithms provide strong bicriteria guarantees, with approximation ratios improving from 4 (earlier work) to 5 and 6. For two groups or small 7, exhaustive enumeration or Fair-GMM closes the gap to 8 and 9 (Moumoulidou et al., 2020, Addanki et al., 2022).
- Scalability advances leverage geometric coreset constructions and data structures for range queries (BBD trees, kd-trees), enabling efficient distributed and streaming variants with guaranteed diversity (Kurkure et al., 2024, Addanki et al., 2022).
- In specialized settings (0-weighted graphs, 1 metrics, or exact quotas), combinatorial and DP approaches prove optimality or tight bounds (Alipour, 7 Feb 2026, Addanki et al., 2022).
Open questions include:
- Whether polynomial-time 2-approximations under perfect fairness are achievable.
- Improving approximation factors for overlapping-groups and high group-count regimes (Wang et al., 2023, Moumoulidou et al., 2020).
- Designing provably robust streaming/distributed methods in more general fairness models or with broader diversity objectives.
6. Connections to Broader Fairness and Robust Optimization
The fair max-min subset selection paradigm acts as a robustification layer over classical subset selection and diversity maximization:
- Robustness:
The min-pairwise objective immunizes against worst-case lack of diversity within the selected subset (Alipour, 7 Feb 2026, Kurkure et al., 2024).
- Fairness:
Exact or approximate quota enforcement extends classical diversification to group-sensitive settings, crucial for ethical and practical equity in recommender, search, and algorithmic decision-making systems (Moumoulidou et al., 2020).
- Algorithmic Unification:
The frameworks described subsume classical problems in matching, clustering, independent sets, and matrix approximation, and draw algorithmic ingredients from combinatorial optimization, polyhedral relaxation, flow algorithms, geometric data structures, and randomized rounding.
These links highlight the fundamental role of Fair Max-Min Subset Selection as both a theoretical primitive and a practical methodology for fair, robust, and representative selection under quotas.