Reassessed Labels (ReaL): Theory & Applications
- Reassessed Labels (ReaL) is a dual framework that refines classical rank theory in projective varieties and enhances policy optimization in reinforcement learning by using structured label decompositions.
- In algebraic geometry, ReaL defines admissible decompositions by quantifying real and complex-conjugate components, offering tighter bounds and improved uniqueness compared to traditional real rank methods.
- In reinforcement learning, ReaL treats verifiable rewards as binary labels, employing an anchor-logit mechanism to produce bounded, monotonic gradient updates and superior Pass@1 performance.
Reassessed Labels (ReaL) constitute a conceptual and computational framework that arises independently in two modern research contexts. The first is in the structure theory of real projective varieties, where labels quantify the composition of decompositions into real and complex-conjugate points, with deep relevance to admissible rank and typical rank phenomena. The second is in machine learning, especially reinforcement learning for LLMs, where “Rewards as Labels” (ReaL) reformulates policy optimization as a classification task, assigning discrete labels to rollouts based on verifiable reward signals. In both domains, ReaL provides refined instruments for distinguishing and improving upon classical approaches in rank theory and policy gradient methods, respectively.
1. Admissible Rank and Label Decompositions for Projective Varieties
Let be a nondegenerate irreducible complex projective variety defined over . For a real point , the classical -rank is the minimal cardinality of a set (for ) such that . The admissible rank, , is defined by restricting attention to finite subsets globally stable under complex conjugation, , and such that lies in their real span. This intermediate notion interpolates between real and complex rank, satisfying for all real (Ballico et al., 2019).
2. Label Structure: Definition and Weight
Given an admissible decomposition , the label is the pair , where is the number of real points and is the number of complex-conjugate pairs, with the total cardinality . Formally,
Labels serve as refined invariants of decompositions and directly relate to the admissible rank through the constraint . For given admissible rank , labels obey and (Ballico et al., 2019).
3. Typical Labels and Generic Behaviour
A label is typical if there exists a full-dimensional Euclidean open set such that each admits an admissible decomposition with exactly that label and with minimal admissible rank. Notably, if is Zariski dense in and is the generic complex rank, every with $2a + b = g$ is typical. For specific curves, further structure arises; for example, linearly normal real elliptic curves of odd degree manifest typical labels with both generic and generic+1 weight values (Ballico et al., 2019).
| Context | Condition | Typical Labels |
|---|---|---|
| Rational normal curves | Degree | |
| Generic projective var. | Zariski-dense real part | |
| Real rank | Only real points in support | Labels of form |
4. Labels for Rational Normal Curves
For the rational normal curve over , the admissible rank equals the complex rank for all real points, i.e., . All typical labels satisfy , and there are no labels of larger weight occurring generically (Ballico et al., 2019).
Table of typical labels for :
| (deg) | Generic complex rank | Typical labels |
|---|---|---|
| Odd | All with $2a + b = m+1$ | |
| Even | All with $2a + b = m+1$ |
The only extreme cases are and for odd , representing decompositions entirely over the reals or entirely nonreal in conjugate pairs.
5. Scheme-Theoretic Labels and Cactus Rank Analogues
The scheme-theoretic (cactus) version of admissible rank considers 0-dimensional schemes fixed by conjugation, i.e., . The admissible cactus rank of is the minimal length of such a scheme whose span contains . Scheme-labels are richer, recording lengths of conjugation-stable connected components. In the reduced case, the scheme-label reduces to the pair . For points of cactus-admissible rank below the linear independence threshold, the decomposition is unique and the scheme-label is well-defined (Ballico et al., 2019).
6. Comparison With Real Rank Theory
Real rank admits only real support in decompositions, with labels restricted to . Admissible rank, in contrast, allows decompositions with complex conjugate summands, facilitating lower or more flexible bounds. For rational normal curves , typical real ranks range from to , whereas all admissible labels are concentrated at . Admissible rank maintains much of the good generic uniqueness and bound properties of complex rank, yet tracks real versus complex phenomena via the integer in the label (Ballico et al., 2019).
7. “Rewards as Labels” (ReaL) in Reinforcement Learning
In a distinct context, the ReaL framework for RLVR (Reinforcement Learning with Verifiable Rewards) treats verifiable reward signals as categorical binary labels—a conceptual shift from scalar-reward policy gradients to classification-style updates. Each rollout is labeled if the reward (positive), and 0 otherwise (negative). The policy is optimized by minimizing a binary cross-entropy loss on “relative log-probability” scores , computed as
To enhance separation of positive and negative samples, a fixed anchor logit is introduced, leading to a softmax cross-entropy loss incorporating this anchor. The resulting loss yields monotonic and bounded gradient weighting, preventing hard negatives from dominating and properly prioritizing under-confident positives. Empirical results demonstrate gains in Pass@1 on mathematical reasoning benchmarks: at 1.5B parameters, ReaL outperforms DAPO by 6.7%; at 7B, ReaL surpasses DAPO and GSPO by 6.2% and 1.7%, respectively (Zhai et al., 5 Feb 2026).
| Model Size | GRPO | DAPO | GSPO | ReaL (Pass@1 %) |
|---|---|---|---|---|
| 1.5B | 43.1 | 45.9 | 51.9 | 52.6 |
| 7B | 59.2 | 57.0 | 61.5 | 63.2 |
Key properties include monotonic, bounded gradient assignment, empirical stability, and generalization across tasks. The classification formulation and anchor-logit mechanism distinguish ReaL from prior reward-weighted policy gradient methods (Zhai et al., 5 Feb 2026).