k-Kemeny: Rank Aggregation & Diversity
- The k-Kemeny problem is a rank aggregation framework that minimizes the total number of adjacent swaps to cluster votes into at most k distinct rankings, quantifying preference diversity.
- It applies to structured domains like single-peaked, single-crossing, group-separable, and Euclidean models, offering insights into how domain restrictions impact diversity.
- The problem is NP-complete in general, with specific fixed-parameter tractable cases highlighting trade-offs between expressive preference modeling and computational efficiency.
The k-Kemeny problem is a central topic in computational social choice, machine learning, and the analysis of structured preference domains. It generalizes classical Kemeny rank aggregation by asking, given a collection of votes (linear orders) over a set of candidates, what is the minimum total number of adjacent swaps needed so that the profile can be “explained” by at most k different rankings. The problem provides a formal measure of diversity in elections and connects preference aggregation with clustering, approximation, and parameterized complexity. Recent research analyzes its computational complexity in highly structured domains, offers new fixed-parameter algorithms, and uses k-Kemeny scores to rank domains by intrinsic diversity (Faliszewski et al., 19 Sep 2025).
1. Formal Definition and Diversity Interpretation
In the k-Kemeny problem one is given an election , where is a set of candidates and is a multiset of votes, each vote being a ranking of . The aim is to find a set of linear orders minimizing
where is the Kendall–tau distance (number of adjacent swaps) between ranking and . Each vote is assigned to its closest center (ranking from ), minimizing the aggregated “distance-to-center” cost over all votes.
This framework generalizes the classical Kemeny problem (the case ), translating aggregation into a clustering/minsum problem over the permutation space. The normalized vector succinctly quantifies the “diversity profile” of an election or domain (Faliszewski et al., 19 Sep 2025).
2. Application to Structured Domains
The problem is studied across various preference domains, highlighting how domain restrictions impact both diversity and complexity:
- Single-Peaked (SP): Voters’ preferences align along a societal axis; top-t candidates always form a contiguous interval on this axis.
- Single-Crossing (SC): Voters can be linearly ordered so that all pairwise switches between candidates occur at most once.
- Group-Separable (GS): Candidates can be split hierarchically (e.g., balanced binary trees or caterpillars), with votes consistent with the tree’s structure.
- Euclidean Domains (d-dimensional): Both voters and candidates are embedded in ; each vote ranks candidates by Euclidean distance from the voter’s ideal point.
A significant empirical result is that, perhaps counterintuitively, highly structured domains like GS/cat (caterpillar group-separable) can be among the most diverse, as reflected by higher normalized scores compared to classical “random” domains or single-peaked settings (Faliszewski et al., 19 Sep 2025).
3. Computational Complexity and Algorithms
The k-Kemeny problem is NP-complete in the general case and remains intractable under many natural domain restrictions:
- Hardness: For , the problem is NP-complete for elections that are both single-peaked and group-separable (balanced or caterpillar). The same holds for many d-Euclidean domains unless both and are fixed (Faliszewski et al., 19 Sep 2025).
- Parameterized/FPT Results: Certain highly restricted settings do admit tractable algorithms. If both the number of distinct rankings and the embedding dimension are fixed (in d-Euclidean), the number of possible rankings is polynomial in , and so brute force or dynamic programming algorithms solve the problem efficiently (Faliszewski et al., 19 Sep 2025).
- Condorcet Domains: For domains guaranteeing the existence of a Condorcet winner/ranking, the problem is fixed-parameter tractable in (the number of votes), via dynamic programming.
- Single-Crossing Elections: By reduction to the (polynomial-time) Chamberlin–Courant multiwinner rule for single-peaked profiles, an efficient solution is available in these cases.
This landscape emphasizes the difficulty of rank aggregation by clustering, even when voters' preferences are restricted or highly structured.
4. Diversity Ranking of Domains
The use of k-Kemeny scores as a measure of diversity allows for empirical and theoretical ranking of preference domains:
Domain | Structural Type | Diversity Rank (Example) |
---|---|---|
GS/cat | Group-separable | Highest |
3D-Cube | Euclidean (3D) | High |
2D-Square | Euclidean (2D) | Upper-middle |
SPOC | Structured, other | Upper-middle |
GS/bal, SP/DF | Group-sep/SP | Middle |
SP | Single-peaked | Lower-middle |
SC, 1D-Int. | Single-cross./1D | Lowest |
This ranking is based on dominance between the normalized k-Kemeny score vectors: if one domain’s vector is greater term-by-term, it is strictly more diverse. Experimental studies confirm that, e.g., GS/cat domains yield vote distributions that are harder to “cluster away,” reflecting greater diversity (Faliszewski et al., 19 Sep 2025).
5. Broader Implications and Methodological Insights
The k-Kemeny problem provides a formal approach to assessing and ranking diversity in preference domains for experimental and theoretical social choice research (Faliszewski et al., 19 Sep 2025). Notable implications:
- Preference Synthesis and Data Generation: Researchers can use k-Kemeny-based metrics to select domain models for experiments that yield elections of a desired diversity profile. Structured, yet diverse, domains can be engineered by focusing on GS/cat-like or high-dimensional Euclidean models.
- Algorithmic Caution: The fact that k-Kemeny is hard in most structured domains even for indicates that clustering-based aggregation and consensus finding remain computationally intensive in practice, even outside of “worst-case” or impartial culture inputs.
- Domain Analysis: The methodology distinguishes between domains that are “reverse-symmetric” and “reverse-free,” with implications for both diversity and the tractability of aggregation/clustering (Faliszewski et al., 19 Sep 2025).
6. Future Directions and Open Questions
The detailed exploration in (Faliszewski et al., 19 Sep 2025) suggests several open research avenues:
- Analyzing larger candidate sets and refining statistical cultures to extend the ranking of domains by diversity.
- Investigating trade-offs between expressive power (diversity) and computational tractability.
- Developing more efficient algorithms or tighter approximation schemes for k-Kemeny in the presence of domain structure.
- Applying these diversity metrics in practical settings such as recommender system clustering, preference elicitation design, or social choice mechanism selection.
A plausible implication is that, as k-Kemeny scores quantify the “clusterability” of an election, future experimental studies in computational social choice should carefully consider underlying domain diversity—moving beyond traditional random or single-peaked datasets—using k-Kemeny metrics to guide synthetic data generation and domain selection.
References
- “Diversity of Structured Domains via k-Kemeny Scores” (Faliszewski et al., 19 Sep 2025)