ACE/RCE: Continuous Error Metrics

Updated 26 October 2025

ACE/RCE are metrics that quantify model performance volatility by summing changes across continuous input levels, providing a clear measure of stability.
They are used in benchmarking to distinguish robust models that maintain consistent performance despite dynamic or systematic perturbations.
Their computation—using absolute differences and mean normalization—not only captures raw volatility but also guides optimization for enhanced reliability.

Absolute/Relative Continuous Error (ACE/RCE) quantifies the volatility or stability of a model’s performance across a continuous parameter spectrum, typically between consecutive levels of input conditions such as resolution, time, or other controllable variables. In the context of modern evaluation frameworks, ACE and RCE are used to measure non-semantic aspects of model robustness—specifically, the magnitude and proportional significance of performance fluctuations under dynamic scenarios. These metrics generalize the concept of error from pointwise or aggregate summary measures to a continuous sequence, providing direct insight into a model’s resilience to environmental variation or systematic perturbation.

1. Formal Definition and Mathematical Properties

ACE (Absolute Continuous Error) is defined as the sum of the absolute differences in performance metrics (e.g., accuracy) between consecutive levels of a continuous input parameter. For a set of $n$ levels, where performance is measured at each level $i$ as $A_i$ , ACE is given by:

$\mathrm{ACE} = \sum_{i=1}^{n-1} |A_{i+1} - A_i|$

RCE (Relative Continuous Error) normalizes ACE by the average performance $\bar{A}$ over all levels, yielding a percentage-based stability metric:

$\bar{A} = \frac{1}{n} \sum_{i=1}^n A_i,\quad \mathrm{RCE} = \frac{\mathrm{ACE}}{\bar{A}}$

These definitions ensure that ACE captures the raw volatility, whereas RCE calibrates this volatility relative to the model’s overall competency, allowing for fair comparisons between models with differing average performance (Li et al., 19 Oct 2025).

2. Role in Model Robustness and Volatility Analysis

ACE/RCE are pivotal in distinguishing models with stable outputs from those whose performance is erratic under continuous changes. A model with low ACE and RCE demonstrates robust, consistent behavior as the input parameter (e.g., image resolution) varies, while large values indicate sensitivity or instability. In multimodal evaluation, these metrics complement traditional aggregate accuracy and trend-monotonicity measures (e.g., Spearman's $\rho$ ) by directly quantifying the non-monotonic, abrupt changes that may affect reliability in practical deployment (Li et al., 19 Oct 2025).

This approach provides a fine-grained analysis of robustness, as ACE and RCE can highlight deleterious "volatility spikes" overlooked by summary metrics.

3. Methodological Foundations and Computation

Computation of ACE and RCE requires:

Selection of a continuous parameter domain (such as consecutive resolution levels for input images).
Measurement of performance $A_i$ at each point or discrete interval.
Application of the formulae above to extract ACE and RCE.

The metrics are applied after model evaluation across all levels, thereby reflecting both smooth and abrupt variations without presuming monotonicity or linearity. RCE further facilitates model comparison by rescaling absolute volatility against mean performance, which is critical when candidate models operate at different performance regimes.

This structure positions ACE/RCE as a post hoc diagnostic tool for evaluating stability in dynamic input settings.

4. Applications in Benchmarking and Comparative Evaluation

Within the Res-Bench framework for dynamic resolution robustness in MLLMs (Li et al., 19 Oct 2025), ACE and RCE are used alongside mean accuracy and trend metrics to characterize model behavior:

Metric	Definition	Interpretation
ACE	$\sum_{i=1}^{n-1}\|A_{i+1}-A_i\|$	Magnitude of performance change per increment
RCE	$\mathrm{ACE}/\bar{A}$	Stability relative to average accuracy
$\rho$	Spearman's correlation of $(A_i, i)$	Monotonicity of trend versus parameter level

The metrics reveal trade-offs: methods processing images at native resolution often exhibit high absolute performance at the highest levels but can manifest large ACE/RCE values, denoting poor robustness as input quality degrades. Patch-based preprocessing techniques generally achieve more stable ACE/RCE profiles, indicating enhanced resilience to systematic variation (Li et al., 19 Oct 2025).

5. Relationship to Classical Error Measures

ACE and RCE generalize pointwise absolute and relative errors to the continuous context. While classical absolute error quantifies deviation from a true value at a fixed point and relative error scales this against the magnitude of the true value, ACE/RCE extend these notions to aggregate, continuous volatility. Notably, ACE is aligned with the $L_1$ -norm of discrete differences, and RCE is its mean-normalized analog, providing interpretable summary statistics of stability.

In dynamic benchmarking or time-series analysis, these metrics are particularly revealing for identifying sudden "performance breaks" and guiding preprocessing or fine-tuning efforts for robustness enhancement.

6. Implications, Limitations, and Practical Guidance

The usage of ACE/RCE brings several implications and considerations:

ACE/RCE isolate robustness from aggregate performance, providing a complementary perspective critical for risk-sensitive applications.
RCE normalization is essential when comparing disparate models, ensuring that volatility measurements are contextually relevant.
These metrics do not distinguish the direction of fluctuation; further analysis might separately quantify upward versus downward volatility.
Their practicality depends on the granularity of parameter steps: sufficiently dense sampling ensures meaningful stability analysis.

A plausible implication is that optimizing model training or preprocessing for low ACE/RCE—rather than solely targeting peak accuracy—can yield models better suited for deployment in environments with continuous or unpredictable input variation.

7. Comparative Significance in Modern Robustness Research

The integration of ACE and RCE into advanced evaluation frameworks represents a methodological advance in robustness and volatility analysis for AI systems (Li et al., 19 Oct 2025). They provide actionable diagnostics for model designers and practitioners, informing strategies for preprocessing (e.g., padding, super-resolution), architecture selection, and finetuning across operational input domains.

These metrics are applicable beyond MLLMs: the underlying principles are directly transferable to time-series analysis, dynamic simulation (e.g., control systems, forecasting), and any domain where performance stability across a continuous spectrum is critical.

In summary, Absolute/Relative Continuous Error metrics systematically quantify the fluctuating stability of model outputs over dynamic input conditions. Their adoption in next-generation benchmarking enables rigorous, granular assessment of robustness, informing both theoretical analysis and practical model design.

PDF Markdown Chat (Pro)

References (1)

Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Absolute/Relative Continuous Error (ACE/RCE).