Generalized Average Ranking Scores (GARS)

Updated 30 January 2026

Generalized Average Ranking Scores (GARS) is a versatile, nonparametric ranking framework that aggregates diverse preference data through flexible mapping functions.
It extends classical methods like Borda, Bradley–Terry, and PageRank by effectively handling incomplete top-lists and multicategory outcomes.
GARS employs efficient approximation and debiased machine learning techniques to achieve robust statistical inference and scalable computation.

Generalized Average Ranking Scores (GARS) constitute a unified, nonparametric class of ranking metrics designed to aggregate preference data across a range of evaluation scenarios, including incomplete top-lists and pairwise comparisons with ties or multicategorical outcomes. GARS generalizes classical rank aggregation frameworks such as Borda, Bradley–Terry, and Rank Centrality by treating the item ranking problem as the task of mapping a family of itemwise pair preference probabilities to a low-dimensional score vector via a user-specified function. Efficient inferential and computational methods—such as constant-factor approximation algorithms or semiparametrically efficient estimators via debiased machine learning—are available for GARS in large-scale aggregation and evaluation contexts (Frauen et al., 29 Jan 2026, Mathieu et al., 2018).

1. Formal Definition and Motivation

Let $K$ denote the number of items to be ranked. For each context $X \in \mathcal{X}$ , all ordered pairs $(j,k)$ where $j, k \in \{1,\ldots,K\}$ are considered; for each pair, a categorical label $Y_{jkc}$ is obtained, indicating the outcome (‘ $j$ beats $k$ ’, ‘ $k$ beats $j$ ’, ‘tie’, etc.), with $C$ denoting the number of response categories. Write $\mu_{jkc}(x) = P(Y_{jkc} = 1 \mid X = x)$ and $\mu(x) \in [0,1]^{K \times K \times C}$ collecting all such conditional probabilities.

A Generalized Average Ranking Score (GARS) is specified as the expectation of a user-chosen mapping $F : [0,1]^{K \times K \times C} \rightarrow \mathbb{R}^d$ , so that

$\theta = E[F(\mu(X))] \in \mathbb{R}^d$

where $d$ is typically $K$ or larger. The form of $F$ can instantiate classical ranking methods or incorporate more flexible, application-specific metrics (Frauen et al., 29 Jan 2026).

2. Special Cases: Classical Rank Aggregation and Top-List Algorithms

GARS encompasses and extends several established ranking models:

Borda (Average-Win-Rate): For binary comparisons ( $C = 2$ ), the Borda score for item $j$ is

$F_j(\mu(x)) = \frac{1}{2(K-1)} \sum_{k \neq j} [\mu_{jk,1}(x) + \mu_{kj,2}(x)]$

The overall score vector $\theta_j = E[F_j(\mu(X))]$ encodes the average likelihood $j$ prevails over $k$ across prompts.

Bradley–Terry Projections: For inference from log-odds,

$\ell_{jk}(x) = \frac{\text{logit}(\mu_{jk,1}(x)) + \text{logit}(\mu_{kj,2}(x))}{2}$

With appropriate incidence and projection matrices ( $B$ , $L_0$ , $H$ ), GARS yields projected latent quality scores consistent with the BT model when its assumptions hold (Frauen et al., 29 Jan 2026).

Rank Centrality/PageRank: GARS can represent stationary distributions over itemwise symmetrized transition matrices, so that the score vector $s(x)$ solves $s(x) \propto T(x)^\top s(x)$ , where $T_{ij}(x)$ is derived from preference probabilities. This accommodates PageRank-type aggregation (Frauen et al., 29 Jan 2026).
Incomplete Top-List Aggregation: When only a subset of candidates is ranked in each input list, the pair $(s(v), r(v))$ for $v \in V$ is defined by

$s(v) = \Pr_{T \sim p}[rank_T(v) < \infty], \qquad r(v) = \frac{1}{s(v)} \sum_{T:rank_T(v)<\infty} p(T) \cdot rank_T(v)$

The GARS vector for $v$ is $(-s(v), r(v))$ , with items sorted lexicographically to produce a generalized Borda total order (Mathieu et al., 2018).

3. Algorithmic Construction and Approximation Guarantees

GARS supports algorithmic ranking procedures with rigorous worst-case approximation bounds. In the context of top-list aggregation (Mathieu et al., 2018):

Two-Phase Generalized Borda Algorithm: Compute $s(v)$ and $r(v)$ for all items; sort by non-increasing $s(v)$ , breaking ties by non-decreasing $r(v)$ . This produces a ranking that is a constant-factor ($6$-approximation) solution to the top-list aggregation objective measured by expected Kendall- $\tau$ distance. The pseudocode is efficient, running in $O(\sum_T|T| + n\log n)$ time.
PTAS Enhancement: By bucketing items according to $s(v)$ and using the Mathieu–Schudy PTAS for full-ranking aggregation within each bucket, a $(1+\epsilon)$ -approximation is achieved in total time $O(n^3\log n\cdot (1+\log(1/\epsilon)) + n\cdot \exp(\exp(O(1/\epsilon))))$ . This relies on concentration arguments and bucket-respecting optimality (Mathieu et al., 2018).

Method	Approximation Ratio	Time Complexity
Two-phase Generalized Borda	6	$O(\sum_T\|T\| + n\log n)$
SCORE-THEN-PTAS	$1+\epsilon$	$O(n^3\log n \cdot \text{polylog}(1/\epsilon) + n \cdot \exp(\exp(O(1/\epsilon))))$

4. Semiparametric Theory and Efficient Inference

For modern LLM evaluation and similar noisy, high-dimensional contexts, GARS is equipped with semiparametric efficiency theory (Frauen et al., 29 Jan 2026):

Efficient Influence Function (EIF):

$\phi(O,\eta,\theta) = F(\mu(X)) - \theta + \sum_{j \neq k} \left[\frac{S_{jk}}{\pi_{jk}(X)}\right] J_{jk}(\mu(X))[Y_{jk} - \mu_{jk}(X)]$

Here $J_{jk}(\mu)$ is the Jacobian of $F$ w.r.t. $\mu_{jk}$ , $S_{jk}$ is the pair labeling indicator, and $\pi_{jk}(x) = P(S_{jk}=1|X=x)$ is the context-dependent labeling probability. Under regularity, the debiased estimator

$\hat{\theta}_{EIF}$

is asymptotically normal and achieves the semiparametric lower variance bound. Joint and coordinatewise confidence regions are constructed from the empirical covariance of EIF values.

5. Estimation Procedures in Practice

Practical estimation of GARS from preference datasets is implemented via:

Cross-Fitting with Black-Box Learners: Data is divided into folds ( $V\geq2$ ), with out-of-fold predictions for both $\mu_{jk}(x)$ (categorical classifier) and $\pi_{jk}(x)$ (binary classifier). Judges can be external machine learning models, incorporated as features (“judge-as-feature”).
Debiased One-Step Estimation: After nuisance prediction, scores are corrected via the EIF formula, yielding valid uncertainty quantification and robust estimates under both parametric and nonparametric conditions.
Handling Ties and Rich Labels: Multicategory classifiers predict responses, and category-weights in $F$ allow downstream methods (e.g., viewing ties as half-wins).

6. Optimal Data Acquisition under Budget Constraints

GARS enables principled policies for preference data acquisition:

A-Optimality: Minimizes total score variance. The optimal sampling policy for pair $(j,k)$ and context $x$ is

$\pi_{jk}^*(x) = \text{clip}_{[\alpha,1]}\sqrt{\frac{\text{tr}(J_{jk}(\mu(x)) V_{jk}(\mu(x)) J_{jk}(\mu(x))^\top)}{\lambda_A c_{jk}}}$

Where $V_{jk}(\mu(x))$ is covariance of label, $c_{jk}$ is cost, and $\lambda_A$ selected by budget constraint.

D-Optimality: Minimizes determinant of the covariance. The policy $\pi^D_{jk}(x)$ is described by a fixed-point equation involving the full covariance structure $\Sigma(\pi^D)$ .
Single-Pair-Per-Context Constraints: Admits “capped water-filling” policies with per-context dual variables. This supports allocation planning in LLM evaluation and other data-collection environments.

7. Key Theoretical Results and Applications

Asymptotic Properties: One-step debiased estimators are semiparametrically efficient and asymptotically normal for all GARS targets (Frauen et al., 29 Jan 2026).
Algorithmic Guarantees: Two-phase generalized Borda and PTAS upgradations guarantee constant-factor and arbitrarily tight approximations for top-list aggregation (Mathieu et al., 2018).
Empirical Validation: Studies on synthetic and real datasets (including Chatbot Arena and MT-Bench) demonstrate superiority of nonparametric GARS estimators over plug-in approaches, valid confidence reports, and actionable ranking differences in practical leaderboards.
Judge Augmentation: Incorporation of high-quality external judges as features yields substantial reduction in estimation error.
Robustness to Misspecification: Nonparametric GARS projection methods outperform strictly parametric (e.g., BT) scoring under model violation.

GARS provides a flexible, theoretically principled, and computationally efficient framework for preference-based ranking and aggregation across diverse technical domains, with rigorous guarantees both algorithmic and statistical (Frauen et al., 29 Jan 2026, Mathieu et al., 2018).

Markdown Report Issue Upgrade to Chat

References (2)

Nonparametric LLM Evaluation from Preference Data (2026)

How to aggregate Top-lists: Approximation algorithms via scores and average ranks (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generalized Average Ranking Scores (GARS).

Generalized Average Ranking Scores (GARS)

1. Formal Definition and Motivation

2. Special Cases: Classical Rank Aggregation and Top-List Algorithms

3. Algorithmic Construction and Approximation Guarantees

4. Semiparametric Theory and Efficient Inference

5. Estimation Procedures in Practice

6. Optimal Data Acquisition under Budget Constraints

7. Key Theoretical Results and Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Generalized Average Ranking Scores (GARS)

1. Formal Definition and Motivation

2. Special Cases: Classical Rank Aggregation and Top-List Algorithms

3. Algorithmic Construction and Approximation Guarantees

4. Semiparametric Theory and Efficient Inference

5. Estimation Procedures in Practice

6. Optimal Data Acquisition under Budget Constraints

7. Key Theoretical Results and Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research