Proportional Fairness in Clustering

Updated 8 January 2026

Proportional fairness criteria are a set of conditions in clustering that ensure each sufficiently large group receives outcomes proportional to its size.
They integrate centroid, non-centroid, and semi-centroid models through combined loss functions to balance representation and clustering efficiency.
Approximation algorithms like Dual-Metric, GC, and SemiBall achieve constant-factor fairness while managing efficiency-fairness tradeoffs in practical settings.

Proportional fairness criteria are a set of rigorous conditions applied in clustering analysis to ensure equitable outcomes for agents or data points, inspired by principles of proportional representation in democratic systems. These criteria formalize the requirement that sufficiently large groups of agents should not be able to substantially improve their own clustering outcomes via deviation, relative to their "entitlement," which is proportional to the overall size of the data and the number of clusters. Their study spans centroid, non-centroid, and newly unified semi-centroid clustering settings, with exact and approximate satisfaction algorithms accompanied by lower bounds and efficiency tradeoffs (Cookson et al., 1 Jan 2026).

1. Clustering Paradigms and Loss Functions

Proportional fairness criteria can be instantiated in several clustering frameworks, each associated with distinct notions of agent loss:

Centroid Clustering: Each agent's loss is its distance to a representative centroid chosen from its cluster.
Non-Centroid (Diameter) Clustering: Each agent's loss is the maximal distance to any other point in its cluster.
Semi-Centroid Clustering (Editor's term): The loss for agent $i$ assigned to cluster $C$ with center $x$ is a combination of centroid and diameter contributions: $\ell_i(C, x) = d^c(i, x) + \max_{j \in C} d^m(i, j)$ , possibly parameterized by weight $\lambda$ in weighted single-metric scenarios.

Let $N$ denote $n$ agents, $M$ the set of allowable centers, $\ell_i(C, x)$ the loss function for agent $i$ , and $k$ the number of clusters; a clustering consists of a partition $\{(C_1, x_1), \ldots, (C_k, x_k)\}$ (Cookson et al., 1 Jan 2026).

2. Formal Proportional Fairness Criteria

Two central proportional fairness criteria generalize previous representations of fairness in clustering:

$\alpha$ -Core: For $\alpha \ge 1$ , a clustering $X$ is in the $\alpha$ -core if no group $S$ of size $|S| \ge n/k$ and center $y$ satisfies

$\forall i \in S: \quad \alpha \ell_i(S, y) < \ell_i(X)$

That is, no coalition of entitled size can simultaneously and strictly improve its losses by factor $\alpha$ or more via seceding and selecting a joint center. Centroid and non-centroid clusterings correspond to $\lambda = 0, 1$ respectively in semi-centroid loss.

$\alpha$ -Fully Justified Representation (FJR): For $\alpha \ge 1$ , $X$ satisfies $\alpha$ -FJR if there is no $S$ , $|S| \ge n/k$ , and $y$ such that

$\forall i \in S:\ \alpha\,\ell_i(S, y) < \min_{j \in S} \ell_j(X)$

In this relaxed definition, even the worst-off agent post-deviation must surpass the best-off agent pre-deviation by factor $\alpha$ . It holds that $\alpha$ -core implies $\alpha$ -FJR (Cookson et al., 1 Jan 2026).

These criteria interpolate between earlier studies of centroid clustering proportional fairness [CFLM19] and non-centroid (democratic) clustering [CMS24].

3. Algorithms for Approximating Proportional Fairness

Achieving proportional fairness criteria exactly is computationally intractable in general, so research has focused on constant-factor approximation algorithms.

Dual-Metric-Core-Approx Algorithm:
- Phase 1 (MCC Covering): Iteratively locate the most cohesive clusters (size $\ge n/k$ ) minimizing maximum agent loss, yielding clusters $(C_t, x_t)$ .
- Phase 2 (Safe Switching): Permit points to switch clusters only when their “upper-bound” loss improves and other cluster members are not excessively worsened.
- Theorem: Using exact MCC in Phase 1 and $c = 1.5$ , outputs are in the $3$-core for dual-metric loss. Using $4$-approximate MCC and $c = (3\alpha + \sqrt{\alpha(\alpha+8)})/(4\alpha)$ produces a $(3+2\sqrt{3})$ -core clustering in polynomial time (Cookson et al., 1 Jan 2026).
Specialized Algorithms:
- Greedy Capture + Greedy Centroid (GC): For weighted single-metric loss, GC achieves a $2/\lambda$ -core in polynomial time.
- Semi–Ball–Growing (SemiBall): Attains a core approximation of $\frac{\sqrt{2\lambda-11\lambda^2+13}+3-\lambda}{2-2\lambda}$ in weighted loss models, generally outperforming GC in resultant fairness for a range of $\lambda$ values (Cookson et al., 1 Jan 2026).

A summary table of existential and computable core factors for weighted single-metric loss:

Loss Parameter $\lambda$	Existential Core	Polytime Core Approximation
$[0,1]$	$\min\{3, 2/\lambda\}$	$\min\left\{2/\lambda, \frac{\sqrt{2\lambda-11\lambda^2+13}+3-\lambda}{2-2\lambda}\right\}$

This shows that core fairness can be guaranteed up to specific constant factors for any $\lambda$ .

4. Lower Bounds and Hardness Results

Worst-case analysis demonstrates that universally better core guarantees are impossible:

Dual-metric loss: Setting $d^m \equiv 0$ retrieves centroid-only models, for which a $2$-core lower bound is proven [CFLM19]; thus, factors below $2$ are impossible.
Weighted single-metric loss: Lower bounds derived from example families yield

$\max\left\{ \frac{\sqrt{\lambda^2-2\lambda+5}-\lambda+1}{2},\ \frac{2(1-\lambda)}{2\lambda+1} \right\}$

FJR relaxation: The trivial lower bound is $1$, as $\alpha < 1$ cannot be achieved unless exact FJR solutions are permitted (Cookson et al., 1 Jan 2026).

This establishes fundamental limits to achievable proportional fairness in clustering.

5. Algorithms for Fully Justified Representation (FJR)

A simple and always correct algorithm exists for $\alpha$ -FJR:

Iterative-MCC-for-FJR Algorithm:

Initialize agent set $N'$ .
While $N'$ nonempty: find any $\alpha$ -approximate cohesive cluster $(C, x)$ of size $\ge n/k$ , include $(C, x)$ in the partition, remove $C$ from $N'$ .
Output the clustering.

Theorem: This procedure guarantees $\alpha$ -FJR for arbitrary losses. Using a polytime $4$-MCC subroutine results in a polynomial-time $4$-FJR algorithm for dual-metric loss. GC yields $2$-FJR max-diameter and $5$-FJR centroid fairness (Cookson et al., 1 Jan 2026).

6. Experimental Assessment and Efficiency-Fairness Tradeoffs

Empirical evaluation on UCI datasets (Iris, Pima-Diabetes, Adult) with algorithms including GC, SemiBall, $k$ -means++, and $k$ -medoids revealed:

GC and SemiBall consistently attain near-unit violations ( $\approx 1$ ) of both core and FJR, significantly outperforming $k$ -means++ and $k$ -medoids in fairness, especially for small $\lambda$ .
The efficiency cost, measured by clustering objective (sum of distances), is typically mild—often just a few percent worse relative to standard clustering objectives.
SemiBall usually provides marginally superior clustering value compared to GC, while maintaining ideal proportional fairness (Cookson et al., 1 Jan 2026).

A plausible implication is that proportional fairness criteria can be practically realized with very limited compromise to clustering efficiency.

7. Connections to Semi-Centroid Bridged Clustering

Semi-centroid clustering, as generalized in the proportional fairness context, encompasses methods such as Bridged Clustering (Ye et al., 8 Oct 2025). In Bridged Clustering, unpaired input and output datasets are independently clustered, then sparsely bridged via matched pairs, relying on centroid-based prediction, and is model-agnostic and label-efficient. The underlying mathematical structure aligns with the dual-metric and weighted single-metric loss constructions central to proportional fairness analysis. This suggests that proportional fairness algorithms and guarantees may serve as a theoretical foundation for fairness in semi-supervised and representation learning frameworks such as Bridged Clustering.

References

"Unifying Proportional Fairness in Centroid and Non-Centroid Clustering" (Cookson et al., 1 Jan 2026)
"Bridged Clustering for Representation Learning: Semi-Supervised Sparse Bridging" (Ye et al., 8 Oct 2025)
Chakrabarty et al., "Proportionally fair clustering" [CFLM19]
Chen et al., "Democratic clustering" [CMS24]

Proportional fairness criteria thus unify fairness guarantees in clustering, admit robust constant-factor approximations, and extend to new mixed-loss and semi-supervised paradigms.

PDF Markdown Chat (Pro)

References (2)

Unifying Proportional Fairness in Centroid and Non-Centroid Clustering (2026)

Bridged Clustering for Representation Learning: Semi-Supervised Sparse Bridging (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Proportional Fairness Criteria.

Proportional Fairness in Clustering

1. Clustering Paradigms and Loss Functions

2. Formal Proportional Fairness Criteria

3. Algorithms for Approximating Proportional Fairness

4. Lower Bounds and Hardness Results

5. Algorithms for Fully Justified Representation (FJR)

6. Experimental Assessment and Efficiency-Fairness Tradeoffs

7. Connections to Semi-Centroid Bridged Clustering

References

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Proportional Fairness in Clustering

1. Clustering Paradigms and Loss Functions

2. Formal Proportional Fairness Criteria

3. Algorithms for Approximating Proportional Fairness

4. Lower Bounds and Hardness Results

5. Algorithms for Fully Justified Representation (FJR)

6. Experimental Assessment and Efficiency-Fairness Tradeoffs

7. Connections to Semi-Centroid Bridged Clustering

References

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research