Ratio Aggregation: Principles & Applications

Updated 6 February 2026

Ratio Aggregation is a mathematical principle that forms proportion-based aggregates from per-unit quantities.
It distinguishes between ratios of sums and averages of ratios, impacting statistical robustness and fairness in applications such as graph neural networks and evaluation metrics.
Its diverse applications span machine learning evaluation, social welfare theory, and reinforcement learning, offering actionable insights across domains.

Ratio aggregation is a mathematical and logical principle central to various aggregation problems in statistics, machine learning, formal logic, and social choice. At its core, ratio aggregation involves forming proportion-based aggregates—means, fractions, or other ratio-derived quantities—instead of, or in addition to, absolute counts or sums. This distinction has theoretically significant implications for expressivity, statistical robustness, fairness, and practical application across domains ranging from graph neural networks and statistical estimation to evaluation metrics and welfare economics.

1. Mathematical Formulation and Variants

The fundamental question in ratio aggregation is how to combine per-unit, per-agent, or per-instance quantities into a summary statistic. The two most canonical forms are:

Ratio of Sums (RoS): The sum of numerators divided by the sum of denominators.

$p_{\mathrm{RoS}} = \frac{\sum_{i=1}^N a_i}{\sum_{i=1}^N b_i}$

Average of Ratios (AoR): The mean of per-instance ratios.

$p_{\mathrm{AoR}} = \frac{1}{N} \sum_{i=1}^N \frac{a_i}{b_i}$

In general, these two forms are not equal unless all $b_i$ are equal. The ratio of sums (e.g., corpus-level aggregation in BLEU, ratio-of-means in credit risk) weights each instance in proportion to $b_i$ , while the average of ratios (e.g., sentence-level BLEU, mean-of-ratios) treats each instance equally regardless of scale (Formenti, 2014, Cavalin et al., 2024).

2. Ratio Aggregation in Logic and Expressive Power

Ratio aggregation plays a central role in the logical expressivity of graph neural networks (GNNs). In message-passing GNNs, mean aggregation is a built-in ratio operator: for a node $v$ , if the neighbor features are binary, $\mathrm{Mean}(X_v)$ returns the fraction of neighbors satisfying a given property. This operation is precisely formalized in ratio modal logic (RML), which extends classical modal logic with operators that quantify over the proportion of successors satisfying a property (Schönherr et al., 24 Jul 2025).

Formally, RML introduces modalities of the form $\markDiamond^{\geq r}\varphi$, true at $v$ if at least ratio $r$ of successors $u$ satisfy $\varphi$ . The main result states that, for fixed finite graphs, the expressive power of mean-aggregation GNNs coincides exactly with RML. This is strictly more expressive than GNNs with max aggregation (plain modal logic) and strictly less than with sum aggregation (graded modal logic) (Schönherr et al., 24 Jul 2025):

Aggregator	Logic Characterization	Expressivity Hierarchy
Max	Modal Logic (ML)	lowest
Mean	Ratio Modal Logic (RML)	intermediate
Sum	Graded Modal Logic (GML)	highest

In uniform settings, with continuous combination functions and threshold classification, mean aggregation collapses to capturing exactly the alternation-free fragment of modal logic, AFML, which is strictly weaker than both ML and GML.

3. Statistical Ratio Aggregation: Mean vs. Ratio of Means

In classical statistics, ratio aggregation is commonly encountered in the estimation of rates and probabilities. Two estimators for multiperiod probability of default (PD) exemplify the distinction:

Mean of Ratios (MoR): Averages default rates per cohort; equally weighted by cohort.
Ratio of Means (RoM): Pools all events and exposures before dividing; weights proportional to exposure size.

Mathematically:

MoR: $\hat{p}_{\mathrm{MoR}} = \frac{1}{T} \sum_{t=1}^T \frac{D_t}{N_t}$
RoM: $\hat{p}_{\mathrm{RoM}} = \frac{\sum_{t=1}^T D_t}{\sum_{t=1}^T N_t}$

The variance of the RoM estimator is strictly smaller than that of MoR, unless all $N_t$ are identical. As a consequence, RoM produces more stable, lower-variance estimates, particularly when exposure sizes $N_t$ are heterogeneous. This is confirmed both analytically and empirically, e.g., with an 11 basis point reduction in 5-year PD estimates observed in real mortgage data (Formenti, 2014).

4. Aggregation in Machine Learning Evaluation Metrics

Ratio aggregation critically shapes the behavior of evaluation metrics in NLP and MT, particularly BLEU and chrF. Two aggregation forms are prevalent:

Corpus-level (Ratio of Sums): Precision or F-score is calculated by summing matched $n$ -grams/chars and dividing by total across the corpus.
Sentence-level (Average of Ratios): Mean of per-segment precision or F-score.

Empirical findings demonstrate that sentence-level (AoR) aggregation aligns much more closely with human judgments and exhibits dramatically better statistical robustness under resampling, compared to corpus-level (RoS) aggregation. The corpus-level approach is highly sensitive to segment inclusion and exposes inflated or deflated performance, especially when segment lengths vary substantially (Cavalin et al., 2024).

Aggregation	Expression	Statistical Property	Human Correlation (MT)
Corpus	$\displaystyle\frac{\sum_i m_i}{\sum_i w_i}$	High variance, biased by sequence	BLEU r = 0.425 / –0.006
Sentence	$\frac{1}{N} \sum_i \frac{m_i}{w_i}$	Low variance, robust estimator	m-BLEU r = 0.776 / 0.729

This result has practical consequences: for robust system comparison, segment-level aggregation is now recommended as standard for lexical metrics (Cavalin et al., 2024).

The ratio aggregation principle also arises in social welfare orderings to balance between pure aggregation and anti-aggregation requirements. Sakamoto formulates ratio aggregation ( $\alpha$ ): a tiny loss to individual $i$ is acceptable if compensated by sufficiently large gains to at least a proportion $\alpha$ of the other agents. This is in contrast to quantitative aggregation, which requires gains to at least $m$ individuals for fixed $m$ .

Formally: if $v_i = u_i - \varepsilon$ for one $i$ , $v_j = u_j + y$ for $|M| \geq \lceil \alpha n \rceil$ others ( $y > \varepsilon$ ), and $v_h = u_h$ otherwise, then $u \succeq v$ (Sakamoto, 17 Jan 2025).

Ratio aggregation, together with standard axioms (anonymity, Pareto, Pigou–Dalton), is compatible with minimal non-aggregation requirements—i.e., large gains to the poorest should override small losses to the richest—unlike quantitative aggregation, which is inconsistent with non-aggregation even in its minimal form. By contrast, imposing both strong non-aggregation and replication invariance leads uniquely to the leximin social ordering.

6. Recursive Ratio Aggregation in Reinforcement Learning

Recursive ratio aggregation extends beyond simple arithmetic mean to more complex statistics, such as the Sharpe ratio in sequential decision-making. In this algebraic perspective on Markov decision processes (MDPs), the reward aggregation operator is defined recursively:

Sharpe ratio aggregator: Folds a sequence of rewards $(r_1, ..., r_T)$ into running statistics (n, mean, variance) and computes the final value as $S(r_{1:T}) = \mathrm{mean}(r_{1:T}) / \sqrt{\mathrm{var}(r_{1:T})}$ (Tang et al., 11 Jul 2025).

This recursive reward aggregation is integrated into generalized Bellman equations and can be optimized via standard RL algorithms (Q-learning, actor-critic), replacing the standard discounted sum with the desired ratio-based aggregator. Experimental results in portfolio optimization show that direct recursive Sharpe-ratio aggregation achieves both higher returns and lower variance than reward-shaping approximations (Tang et al., 11 Jul 2025).

7. Ratio Parameters in Linker-Mediated Aggregation Kinetics

In physical aggregation processes such as linker-mediated irreversible aggregation of colloidal particles, dynamical evolution and final structures are governed by dimensionless ratio parameters:

$\Phi \equiv N_L / (f N_P)$ , the number of linkers per particle binding site.
$\Lambda \equiv D_L / D_P$ , the diffusion coefficient ratio (linker/particle).

These ratios determine the timescales ( $\tau_L, \tau_P$ ), efficiency of aggregation, and phase boundaries between dispersed and clustered states. Analytic and simulation studies show that aggregation dynamics can be tuned or optimized by adjusting these key ratios, providing experimentally actionable design rules for DNA- or protein-mediated assemblies (Tavares et al., 2020).

References:

(Schönherr et al., 24 Jul 2025) Logical Characterizations of GNNs with Mean Aggregation
(Sakamoto, 17 Jan 2025) A Class of Practical and Acceptable Social Welfare Orderings That Satisfy the Principles of Aggregation and Non-Aggregation
(Cavalin et al., 2024) Sentence-level Aggregation of Lexical Metrics Correlates Stronger with Human Judgements than Corpus-level Aggregation
(Tang et al., 11 Jul 2025) Recursive Reward Aggregation
(Formenti, 2014) Mean of Ratios or Ratio of Means: statistical uncertainty applied to estimate Multiperiod Probability of Defaul
(Tavares et al., 2020) Smoluchowski equations for linker-mediated irreversible aggregation

Markdown Report Issue Upgrade to Chat

References (6)

Mean of Ratios or Ratio of Means: statistical uncertainty applied to estimate Multiperiod Probability of Defaul (2014)

Sentence-level Aggregation of Lexical Metrics Correlates Stronger with Human Judgements than Corpus-level Aggregation (2024)

Logical Characterizations of GNNs with Mean Aggregation (2025)

A Class of Practical and Acceptable Social Welfare Orderings That Satisfy the Principles of Aggregation and Non-Aggregation: Reexamination of the Tyrannies of Aggregation and Non-Aggregation (2025)

Recursive Reward Aggregation (2025)

Smoluchowski equations for linker-mediated irreversible aggregation (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ratio Aggregation.

Ratio Aggregation: Principles & Applications

1. Mathematical Formulation and Variants

2. Ratio Aggregation in Logic and Expressive Power

3. Statistical Ratio Aggregation: Mean vs. Ratio of Means

4. Aggregation in Machine Learning Evaluation Metrics

6. Recursive Ratio Aggregation in Reinforcement Learning

7. Ratio Parameters in Linker-Mediated Aggregation Kinetics

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Ratio Aggregation: Principles & Applications

1. Mathematical Formulation and Variants

2. Ratio Aggregation in Logic and Expressive Power

3. Statistical Ratio Aggregation: Mean vs. Ratio of Means

4. Aggregation in Machine Learning Evaluation Metrics

5. Ratio Aggregation in Social Choice Theory

6. Recursive Ratio Aggregation in Reinforcement Learning

7. Ratio Parameters in Linker-Mediated Aggregation Kinetics

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research