Papers
Topics
Authors
Recent
2000 character limit reached

Proportional Fairness in Clustering

Updated 8 January 2026
  • Proportional fairness criteria are a set of conditions in clustering that ensure each sufficiently large group receives outcomes proportional to its size.
  • They integrate centroid, non-centroid, and semi-centroid models through combined loss functions to balance representation and clustering efficiency.
  • Approximation algorithms like Dual-Metric, GC, and SemiBall achieve constant-factor fairness while managing efficiency-fairness tradeoffs in practical settings.

Proportional fairness criteria are a set of rigorous conditions applied in clustering analysis to ensure equitable outcomes for agents or data points, inspired by principles of proportional representation in democratic systems. These criteria formalize the requirement that sufficiently large groups of agents should not be able to substantially improve their own clustering outcomes via deviation, relative to their "entitlement," which is proportional to the overall size of the data and the number of clusters. Their study spans centroid, non-centroid, and newly unified semi-centroid clustering settings, with exact and approximate satisfaction algorithms accompanied by lower bounds and efficiency tradeoffs (Cookson et al., 1 Jan 2026).

1. Clustering Paradigms and Loss Functions

Proportional fairness criteria can be instantiated in several clustering frameworks, each associated with distinct notions of agent loss:

  • Centroid Clustering: Each agent's loss is its distance to a representative centroid chosen from its cluster.
  • Non-Centroid (Diameter) Clustering: Each agent's loss is the maximal distance to any other point in its cluster.
  • Semi-Centroid Clustering (Editor's term): The loss for agent ii assigned to cluster CC with center xx is a combination of centroid and diameter contributions: i(C,x)=dc(i,x)+maxjCdm(i,j)\ell_i(C, x) = d^c(i, x) + \max_{j \in C} d^m(i, j), possibly parameterized by weight λ\lambda in weighted single-metric scenarios.

Let NN denote nn agents, MM the set of allowable centers, i(C,x)\ell_i(C, x) the loss function for agent ii, and kk the number of clusters; a clustering consists of a partition {(C1,x1),,(Ck,xk)}\{(C_1, x_1), \ldots, (C_k, x_k)\} (Cookson et al., 1 Jan 2026).

2. Formal Proportional Fairness Criteria

Two central proportional fairness criteria generalize previous representations of fairness in clustering:

  • α\alpha-Core: For α1\alpha \ge 1, a clustering XX is in the α\alpha-core if no group SS of size Sn/k|S| \ge n/k and center yy satisfies

iS:αi(S,y)<i(X)\forall i \in S: \quad \alpha \ell_i(S, y) < \ell_i(X)

That is, no coalition of entitled size can simultaneously and strictly improve its losses by factor α\alpha or more via seceding and selecting a joint center. Centroid and non-centroid clusterings correspond to λ=0,1\lambda = 0, 1 respectively in semi-centroid loss.

  • α\alpha-Fully Justified Representation (FJR): For α1\alpha \ge 1, XX satisfies α\alpha-FJR if there is no SS, Sn/k|S| \ge n/k, and yy such that

iS: αi(S,y)<minjSj(X)\forall i \in S:\ \alpha\,\ell_i(S, y) < \min_{j \in S} \ell_j(X)

In this relaxed definition, even the worst-off agent post-deviation must surpass the best-off agent pre-deviation by factor α\alpha. It holds that α\alpha-core implies α\alpha-FJR (Cookson et al., 1 Jan 2026).

These criteria interpolate between earlier studies of centroid clustering proportional fairness [CFLM19] and non-centroid (democratic) clustering [CMS24].

3. Algorithms for Approximating Proportional Fairness

Achieving proportional fairness criteria exactly is computationally intractable in general, so research has focused on constant-factor approximation algorithms.

  • Dual-Metric-Core-Approx Algorithm:
    • Phase 1 (MCC Covering): Iteratively locate the most cohesive clusters (size n/k\ge n/k) minimizing maximum agent loss, yielding clusters (Ct,xt)(C_t, x_t).
    • Phase 2 (Safe Switching): Permit points to switch clusters only when their “upper-bound” loss improves and other cluster members are not excessively worsened.
    • Theorem: Using exact MCC in Phase 1 and c=1.5c = 1.5, outputs are in the $3$-core for dual-metric loss. Using $4$-approximate MCC and c=(3α+α(α+8))/(4α)c = (3\alpha + \sqrt{\alpha(\alpha+8)})/(4\alpha) produces a (3+23)(3+2\sqrt{3})-core clustering in polynomial time (Cookson et al., 1 Jan 2026).
  • Specialized Algorithms:
    • Greedy Capture + Greedy Centroid (GC): For weighted single-metric loss, GC achieves a 2/λ2/\lambda-core in polynomial time.
    • Semi–Ball–Growing (SemiBall): Attains a core approximation of 2λ11λ2+13+3λ22λ\frac{\sqrt{2\lambda-11\lambda^2+13}+3-\lambda}{2-2\lambda} in weighted loss models, generally outperforming GC in resultant fairness for a range of λ\lambda values (Cookson et al., 1 Jan 2026).

A summary table of existential and computable core factors for weighted single-metric loss:

Loss Parameter λ\lambda Existential Core Polytime Core Approximation
[0,1][0,1] min{3,2/λ}\min\{3, 2/\lambda\} min{2/λ,2λ11λ2+13+3λ22λ}\min\left\{2/\lambda, \frac{\sqrt{2\lambda-11\lambda^2+13}+3-\lambda}{2-2\lambda}\right\}

This shows that core fairness can be guaranteed up to specific constant factors for any λ\lambda.

4. Lower Bounds and Hardness Results

Worst-case analysis demonstrates that universally better core guarantees are impossible:

  • Dual-metric loss: Setting dm0d^m \equiv 0 retrieves centroid-only models, for which a $2$-core lower bound is proven [CFLM19]; thus, factors below $2$ are impossible.
  • Weighted single-metric loss: Lower bounds derived from example families yield

max{λ22λ+5λ+12, 2(1λ)2λ+1}\max\left\{ \frac{\sqrt{\lambda^2-2\lambda+5}-\lambda+1}{2},\ \frac{2(1-\lambda)}{2\lambda+1} \right\}

  • FJR relaxation: The trivial lower bound is $1$, as α<1\alpha < 1 cannot be achieved unless exact FJR solutions are permitted (Cookson et al., 1 Jan 2026).

This establishes fundamental limits to achievable proportional fairness in clustering.

5. Algorithms for Fully Justified Representation (FJR)

A simple and always correct algorithm exists for α\alpha-FJR:

  • Iterative-MCC-for-FJR Algorithm:
  1. Initialize agent set NN'.
  2. While NN' nonempty: find any α\alpha-approximate cohesive cluster (C,x)(C, x) of size n/k\ge n/k, include (C,x)(C, x) in the partition, remove CC from NN'.
  3. Output the clustering.

Theorem: This procedure guarantees α\alpha-FJR for arbitrary losses. Using a polytime $4$-MCC subroutine results in a polynomial-time $4$-FJR algorithm for dual-metric loss. GC yields $2$-FJR max-diameter and $5$-FJR centroid fairness (Cookson et al., 1 Jan 2026).

6. Experimental Assessment and Efficiency-Fairness Tradeoffs

Empirical evaluation on UCI datasets (Iris, Pima-Diabetes, Adult) with algorithms including GC, SemiBall, kk-means++, and kk-medoids revealed:

  • GC and SemiBall consistently attain near-unit violations (1\approx 1) of both core and FJR, significantly outperforming kk-means++ and kk-medoids in fairness, especially for small λ\lambda.
  • The efficiency cost, measured by clustering objective (sum of distances), is typically mild—often just a few percent worse relative to standard clustering objectives.
  • SemiBall usually provides marginally superior clustering value compared to GC, while maintaining ideal proportional fairness (Cookson et al., 1 Jan 2026).

A plausible implication is that proportional fairness criteria can be practically realized with very limited compromise to clustering efficiency.

7. Connections to Semi-Centroid Bridged Clustering

Semi-centroid clustering, as generalized in the proportional fairness context, encompasses methods such as Bridged Clustering (Ye et al., 8 Oct 2025). In Bridged Clustering, unpaired input and output datasets are independently clustered, then sparsely bridged via matched pairs, relying on centroid-based prediction, and is model-agnostic and label-efficient. The underlying mathematical structure aligns with the dual-metric and weighted single-metric loss constructions central to proportional fairness analysis. This suggests that proportional fairness algorithms and guarantees may serve as a theoretical foundation for fairness in semi-supervised and representation learning frameworks such as Bridged Clustering.

References

  • "Unifying Proportional Fairness in Centroid and Non-Centroid Clustering" (Cookson et al., 1 Jan 2026)
  • "Bridged Clustering for Representation Learning: Semi-Supervised Sparse Bridging" (Ye et al., 8 Oct 2025)
  • Chakrabarty et al., "Proportionally fair clustering" [CFLM19]
  • Chen et al., "Democratic clustering" [CMS24]

Proportional fairness criteria thus unify fairness guarantees in clustering, admit robust constant-factor approximations, and extend to new mixed-loss and semi-supervised paradigms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Proportional Fairness Criteria.