Papers
Topics
Authors
Recent
Search
2000 character limit reached

Ratio Aggregation: Principles & Applications

Updated 6 February 2026
  • Ratio Aggregation is a mathematical principle that forms proportion-based aggregates from per-unit quantities.
  • It distinguishes between ratios of sums and averages of ratios, impacting statistical robustness and fairness in applications such as graph neural networks and evaluation metrics.
  • Its diverse applications span machine learning evaluation, social welfare theory, and reinforcement learning, offering actionable insights across domains.

Ratio aggregation is a mathematical and logical principle central to various aggregation problems in statistics, machine learning, formal logic, and social choice. At its core, ratio aggregation involves forming proportion-based aggregates—means, fractions, or other ratio-derived quantities—instead of, or in addition to, absolute counts or sums. This distinction has theoretically significant implications for expressivity, statistical robustness, fairness, and practical application across domains ranging from graph neural networks and statistical estimation to evaluation metrics and welfare economics.

1. Mathematical Formulation and Variants

The fundamental question in ratio aggregation is how to combine per-unit, per-agent, or per-instance quantities into a summary statistic. The two most canonical forms are:

  • Ratio of Sums (RoS): The sum of numerators divided by the sum of denominators.

pRoS=∑i=1Nai∑i=1Nbip_{\mathrm{RoS}} = \frac{\sum_{i=1}^N a_i}{\sum_{i=1}^N b_i}

  • Average of Ratios (AoR): The mean of per-instance ratios.

pAoR=1N∑i=1Naibip_{\mathrm{AoR}} = \frac{1}{N} \sum_{i=1}^N \frac{a_i}{b_i}

In general, these two forms are not equal unless all bib_i are equal. The ratio of sums (e.g., corpus-level aggregation in BLEU, ratio-of-means in credit risk) weights each instance in proportion to bib_i, while the average of ratios (e.g., sentence-level BLEU, mean-of-ratios) treats each instance equally regardless of scale (Formenti, 2014, Cavalin et al., 2024).

2. Ratio Aggregation in Logic and Expressive Power

Ratio aggregation plays a central role in the logical expressivity of graph neural networks (GNNs). In message-passing GNNs, mean aggregation is a built-in ratio operator: for a node vv, if the neighbor features are binary, Mean(Xv)\mathrm{Mean}(X_v) returns the fraction of neighbors satisfying a given property. This operation is precisely formalized in ratio modal logic (RML), which extends classical modal logic with operators that quantify over the proportion of successors satisfying a property (Schönherr et al., 24 Jul 2025).

Formally, RML introduces modalities of the form $\markDiamond^{\geq r}\varphi$, true at vv if at least ratio rr of successors uu satisfy φ\varphi. The main result states that, for fixed finite graphs, the expressive power of mean-aggregation GNNs coincides exactly with RML. This is strictly more expressive than GNNs with max aggregation (plain modal logic) and strictly less than with sum aggregation (graded modal logic) (Schönherr et al., 24 Jul 2025):

Aggregator Logic Characterization Expressivity Hierarchy
Max Modal Logic (ML) lowest
Mean Ratio Modal Logic (RML) intermediate
Sum Graded Modal Logic (GML) highest

In uniform settings, with continuous combination functions and threshold classification, mean aggregation collapses to capturing exactly the alternation-free fragment of modal logic, AFML, which is strictly weaker than both ML and GML.

3. Statistical Ratio Aggregation: Mean vs. Ratio of Means

In classical statistics, ratio aggregation is commonly encountered in the estimation of rates and probabilities. Two estimators for multiperiod probability of default (PD) exemplify the distinction:

  • Mean of Ratios (MoR): Averages default rates per cohort; equally weighted by cohort.
  • Ratio of Means (RoM): Pools all events and exposures before dividing; weights proportional to exposure size.

Mathematically:

  • MoR: p^MoR=1T∑t=1TDtNt\hat{p}_{\mathrm{MoR}} = \frac{1}{T} \sum_{t=1}^T \frac{D_t}{N_t}
  • RoM: p^RoM=∑t=1TDt∑t=1TNt\hat{p}_{\mathrm{RoM}} = \frac{\sum_{t=1}^T D_t}{\sum_{t=1}^T N_t}

The variance of the RoM estimator is strictly smaller than that of MoR, unless all NtN_t are identical. As a consequence, RoM produces more stable, lower-variance estimates, particularly when exposure sizes NtN_t are heterogeneous. This is confirmed both analytically and empirically, e.g., with an 11 basis point reduction in 5-year PD estimates observed in real mortgage data (Formenti, 2014).

4. Aggregation in Machine Learning Evaluation Metrics

Ratio aggregation critically shapes the behavior of evaluation metrics in NLP and MT, particularly BLEU and chrF. Two aggregation forms are prevalent:

  • Corpus-level (Ratio of Sums): Precision or F-score is calculated by summing matched nn-grams/chars and dividing by total across the corpus.
  • Sentence-level (Average of Ratios): Mean of per-segment precision or F-score.

Empirical findings demonstrate that sentence-level (AoR) aggregation aligns much more closely with human judgments and exhibits dramatically better statistical robustness under resampling, compared to corpus-level (RoS) aggregation. The corpus-level approach is highly sensitive to segment inclusion and exposes inflated or deflated performance, especially when segment lengths vary substantially (Cavalin et al., 2024).

Aggregation Expression Statistical Property Human Correlation (MT)
Corpus ∑imi∑iwi\displaystyle\frac{\sum_i m_i}{\sum_i w_i} High variance, biased by sequence BLEU r = 0.425 / –0.006
Sentence 1N∑imiwi\frac{1}{N} \sum_i \frac{m_i}{w_i} Low variance, robust estimator m-BLEU r = 0.776 / 0.729

This result has practical consequences: for robust system comparison, segment-level aggregation is now recommended as standard for lexical metrics (Cavalin et al., 2024).

5. Ratio Aggregation in Social Choice Theory

The ratio aggregation principle also arises in social welfare orderings to balance between pure aggregation and anti-aggregation requirements. Sakamoto formulates ratio aggregation (α\alpha): a tiny loss to individual ii is acceptable if compensated by sufficiently large gains to at least a proportion α\alpha of the other agents. This is in contrast to quantitative aggregation, which requires gains to at least mm individuals for fixed mm.

Formally: if vi=ui−εv_i = u_i - \varepsilon for one ii, vj=uj+yv_j = u_j + y for ∣M∣≥⌈αn⌉|M| \geq \lceil \alpha n \rceil others (y>εy > \varepsilon), and vh=uhv_h = u_h otherwise, then u⪰vu \succeq v (Sakamoto, 17 Jan 2025).

Ratio aggregation, together with standard axioms (anonymity, Pareto, Pigou–Dalton), is compatible with minimal non-aggregation requirements—i.e., large gains to the poorest should override small losses to the richest—unlike quantitative aggregation, which is inconsistent with non-aggregation even in its minimal form. By contrast, imposing both strong non-aggregation and replication invariance leads uniquely to the leximin social ordering.

6. Recursive Ratio Aggregation in Reinforcement Learning

Recursive ratio aggregation extends beyond simple arithmetic mean to more complex statistics, such as the Sharpe ratio in sequential decision-making. In this algebraic perspective on Markov decision processes (MDPs), the reward aggregation operator is defined recursively:

  • Sharpe ratio aggregator: Folds a sequence of rewards (r1,...,rT)(r_1, ..., r_T) into running statistics (n, mean, variance) and computes the final value as S(r1:T)=mean(r1:T)/var(r1:T)S(r_{1:T}) = \mathrm{mean}(r_{1:T}) / \sqrt{\mathrm{var}(r_{1:T})} (Tang et al., 11 Jul 2025).

This recursive reward aggregation is integrated into generalized Bellman equations and can be optimized via standard RL algorithms (Q-learning, actor-critic), replacing the standard discounted sum with the desired ratio-based aggregator. Experimental results in portfolio optimization show that direct recursive Sharpe-ratio aggregation achieves both higher returns and lower variance than reward-shaping approximations (Tang et al., 11 Jul 2025).

7. Ratio Parameters in Linker-Mediated Aggregation Kinetics

In physical aggregation processes such as linker-mediated irreversible aggregation of colloidal particles, dynamical evolution and final structures are governed by dimensionless ratio parameters:

  • Φ≡NL/(fNP)\Phi \equiv N_L / (f N_P), the number of linkers per particle binding site.
  • Λ≡DL/DP\Lambda \equiv D_L / D_P, the diffusion coefficient ratio (linker/particle).

These ratios determine the timescales (τL,τP\tau_L, \tau_P), efficiency of aggregation, and phase boundaries between dispersed and clustered states. Analytic and simulation studies show that aggregation dynamics can be tuned or optimized by adjusting these key ratios, providing experimentally actionable design rules for DNA- or protein-mediated assemblies (Tavares et al., 2020).


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ratio Aggregation.