2000 character limit reached

Concentric Incremental Rating Circles (CIRC)

Updated 17 November 2025

Concentric Incremental Rating Circles (CIRC) is a deterministic, Euclidean-based method that maps energy efficiency and accuracy onto a normalized 2D plane.
It assigns 1–5 ratings by computing the Euclidean distance from an ideal point using fixed concentric circles, ensuring transparent and reproducible evaluations.
CIRC enables practical trade-offs between energy sustainability and functional performance with applications in code generation and summarization.

Concentric Incremental Rating Circles (CIRC) is a deterministic, Euclidean-based scoring method for benchmarking code LLMs (CLMs) on the joint criteria of energy efficiency and functional accuracy. Designed to deliver robust, interpretable 1–5 ratings, CIRC situates models in a normalized two-dimensional plane and uses concentric circles centered at the ideal point—representing maximal efficiency and accuracy—to assign categorical scores that reward models closest to the optimum. Its key objective is to balance energy sustainability with functional correctness in a transparent, reproducible manner, providing an outlier-immune and computationally trivial alternative to adaptive rating frameworks (Mehditabar et al., 10 Nov 2025).

1. Foundational Principles and Conceptual Framework

CIRC addresses the challenge of simultaneously evaluating CLMs for both energy efficiency and accuracy. Each model is represented as a point on a 2D plane: the $x$ -axis corresponds to normalized energy efficiency, while the $y$ -axis represents normalized accuracy. The normalization scales both metrics to $[0, 1]$ across all evaluated models, with $(1, 1)$ denoting the ideal performer (highest efficiency, highest accuracy) and $(0, 0)$ the worst. The CIRC scheme overlays five concentric rings, spaced evenly in Euclidean distance, centered at this optimum.

This geometric construction interpolates between the extremes of arithmetic-mean (overly permissive) and harmonic-mean (overly unforgiving) objective fusion. By using the Euclidean norm, CIRC enables meaningful trade-offs between accuracy and efficiency, so models that slightly sacrifice one metric to significantly improve the other are not heavily penalized.

2. Mathematical Formulation

CIRC’s formal operation comprises normalization, distance computation, and discrete score assignment as follows:

Normalization:

For model $m$ with raw energy $E_m$ and raw accuracy $A_m$ , across all $k$ models,

$E_{\min} = \min_m E_m, \quad E_{\max} = \max_m E_m, \quad A_{\min} = \min_m A_m, \quad A_{\max} = \max_m A_m.$

$\text{energy-norm}_m = \frac{E_m - E_{\min}}{E_{\max} - E_{\min}}, \qquad \text{efficiency}_m = 1 - \text{energy-norm}_m$

$\text{accuracy-norm}_m = \frac{A_m - A_{\min}}{A_{\max} - A_{\min}}$

Each model thus maps to $(x_m, y_m) = (\text{efficiency}_m, \text{accuracy-norm}_m)$ .

Distance Calculation:

The Euclidean distance from the ideal point is: $d_m = \sqrt{(1 - x_m)^2 + (1 - y_m)^2}$ with $d_{\max} = \sqrt{2}$ for the extremal case $(0, 0)$ .

Rating Assignment:

Divide $[0, \sqrt{2}]$ into five equal intervals: $\Delta r = \frac{\sqrt{2}}{5}, \quad r_k = k \Delta r, \quad k = 1, 2, 3, 4, 5$ A model is assigned a score based on which ring it occupies: $\text{Score}(m) = \begin{cases} 5, & d_m \leq r_1 \ 4, & r_1 < d_m \leq r_2 \ 3, & r_2 < d_m \leq r_3 \ 2, & r_3 < d_m \leq r_4 \ 1, & r_4 < d_m \leq r_5 = \sqrt{2} \end{cases}$ or equivalently, with the ring index $k_m = \lceil d_m / \Delta r \rceil$ : $\text{Score}(m) = \max(1, 6 - k_m)$

3. Algorithmic Workflow and Computational Characteristics

The CIRC algorithm applies the above formulations in linear time relative to the number of models $k$ :

Input:    Models M = {m₁,…,m_k} each with raw (E_m, A_m).
1. Compute global minima/maxima:      E_min, E_max across all m      A_min, A_max across all m
2. For each model m:
    // 2.1 normalize
    energy_norm  = (E_m - E_min)/(E_max - E_min)
    eff           = 1 - energy_norm
    acc_norm      = (A_m - A_min)/(A_max - A_min)
    // 2.2 map to 2D and compute distance
    x = eff;   y = acc_norm
    d = sqrt((1 - x)^2 + (1 - y)^2)
    // 2.3 assign rating
    Δr = sqrt(2) / 5
    k = ceil(d / Δr)           // k in {1…5}
    rating_m = max(1, 6 - k)    // map k=1→5, k=5→1
Output: { rating_m | m ∈ M }

This pipeline is computationally trivial, requiring only min–max normalization and basic arithmetic, with no data-driven hyperparameters or validation.

4. Robustness to Outliers and Invariance Properties

CIRC is robust to outliers due to its geometric and parameter-independent definitions. Both the ideal origin $(1, 1)$ and the outermost circle boundary $\sqrt{2}$ are fixed for all sets of candidates. The boundaries between score classes—the concentric rings—are also fixed increments in Euclidean space, uninfluenced by the empirical distribution or the presence of extreme performers. As a result, adding a model with exceptional or poor metrics does not impact the relative standing of other models or cause boundary stretching. The absence of fitted parameters ensures invariance to small subpopulations of outlying models.

5. Illustrative Case Studies

The application of CIRC is exemplified in two code-related tasks:

a) Code Generation (LiveCodeBench):

A model with $(0.7, 0.5)$ in normalized efficiency and accuracy yields $d = \sqrt{0.3^2 + 0.5^2} \approx 0.583$ . With $\Delta r \approx 0.283$ , this falls in the third concentric ring, so the score is 3.

b) Code Summarization (CodeXGLUE):

Models frequently map to high normalized accuracy $(0.8 - 0.9)$ and moderate efficiency $(0.4 - 0.6)$ , placing them in the second ring. These receive a score of 4. Only models achieving both metrics above 0.9 fall within the innermost ring, meriting the top score of 5.

6. Methodological Assumptions and Hyperparameterization

Core assumptions include:

All normalization is performed using min–max scaling on the active model set. Adding out-of-range models requires re-normalization and re-scoring.
The reference center $(1, 1)$ is fixed as the “best case” point.
Five rings correspond to a 1–5 rating, but $N$ rings and scores in $[1, N]$ are straightforward generalizations: $\Delta r = \sqrt{2} / N$ .
All increments are uniform in Euclidean space, embodying symmetric trade-off considerations.

7. Comparative Analysis: CIRC Versus Trend-Based Evaluation

CIRC and Observation to Expectation Rating (OTER) are proposed as complementary methods, each with distinct trade-offs:

Feature	CIRC	OTER
Statistical Adaptivity	Static, geometric, data-independent	Trend-aware, curve-fitted
Outlier Sensitivity	Absolute, outlier-immune	Requires robust outlier removal
Rating Boundaries	Fixed concentric rings	Bins based on fitted expectation curve
Interpretability	Plug-and-play, transparent	Hyperparameter-dependent
Trade-off Handling	Symmetric, no frontier exploitation	Can reward frontier-breaking models

CIRC’s core advantages are deterministic operation, transparency, and absolute reproducibility. Its limitations include insensitivity to the empirical energy–accuracy frontier and a lack of increased discrimination when models are highly clustered. OTER offers trend-sensitive rankings but requires additional machinery (robust curve-fitting, hyperparameter selection, and constrained optimization). Each method addresses combined energy–accuracy benchmarking from a distinct perspective, with CIRC prioritized when simplicity and reproducibility are paramount (Mehditabar et al., 10 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Smart but Costly? Benchmarking LLMs on Functional Accuracy and Energy Efficiency (2025)

Follow Topic

Get notified by email when new papers are published related to Concentric Incremental Rating Circles (CIRC).