Papers
Topics
Authors
Recent
Search
2000 character limit reached

Reassessed Labels (ReaL): Theory & Applications

Updated 31 March 2026
  • Reassessed Labels (ReaL) is a dual framework that refines classical rank theory in projective varieties and enhances policy optimization in reinforcement learning by using structured label decompositions.
  • In algebraic geometry, ReaL defines admissible decompositions by quantifying real and complex-conjugate components, offering tighter bounds and improved uniqueness compared to traditional real rank methods.
  • In reinforcement learning, ReaL treats verifiable rewards as binary labels, employing an anchor-logit mechanism to produce bounded, monotonic gradient updates and superior Pass@1 performance.

Reassessed Labels (ReaL) constitute a conceptual and computational framework that arises independently in two modern research contexts. The first is in the structure theory of real projective varieties, where labels quantify the composition of decompositions into real and complex-conjugate points, with deep relevance to admissible rank and typical rank phenomena. The second is in machine learning, especially reinforcement learning for LLMs, where “Rewards as Labels” (ReaL) reformulates policy optimization as a classification task, assigning discrete labels to rollouts based on verifiable reward signals. In both domains, ReaL provides refined instruments for distinguishing and improving upon classical approaches in rank theory and policy gradient methods, respectively.

1. Admissible Rank and Label Decompositions for Projective Varieties

Let XPr(C)X \subset \mathbb{P}^r(\mathbb{C}) be a nondegenerate irreducible complex projective variety defined over R\mathbb{R}. For a real point qPr(R)q \in \mathbb{P}^r(\mathbb{R}), the classical X(K)X(K)-rank is the minimal cardinality of a set SX(K)S \subset X(K) (for K=R,CK = \mathbb{R}, \mathbb{C}) such that qSKq \in \langle S \rangle_K. The admissible rank, rX(C),adm(q)\mathrm{r}_{X(\mathbb{C}),\mathrm{adm}}(q), is defined by restricting attention to finite subsets SX(C)S \subset X(\mathbb{C}) globally stable under complex conjugation, o(S)=So(S) = S, and such that qq lies in their real span. This intermediate notion interpolates between real and complex rank, satisfying rX(C)(q)rX(C),adm(q)2rX(C)(q)\mathrm{r}_{X(\mathbb{C})}(q) \le \mathrm{r}_{X(\mathbb{C}),\mathrm{adm}}(q) \le 2 \mathrm{r}_{X(\mathbb{C})}(q) for all real qq (Ballico et al., 2019).

2. Label Structure: Definition and Weight

Given an admissible decomposition SS, the label is the pair (a,b)N2(a, b) \in \mathbb{N}^2, where bb is the number of real points and aa is the number of complex-conjugate pairs, with the total cardinality w(a,b)=2a+bw(a,b) = 2a + b. Formally,

b=#(SX(R)),a=#Sb2b = \#(S \cap X(\mathbb{R})), \quad a = \frac{\#S - b}{2}

Labels serve as refined invariants of decompositions and directly relate to the admissible rank through the constraint 2a+b=rX(C),adm(q)2a + b = \mathrm{r}_{X(\mathbb{C}),\mathrm{adm}}(q). For given admissible rank kk, labels obey 0bk0 \leq b \leq k and 0ak/20 \leq a \leq \lfloor k/2\rfloor (Ballico et al., 2019).

3. Typical Labels and Generic Behaviour

A label (a,b)(a, b) is typical if there exists a full-dimensional Euclidean open set UPr(R)U \subset \mathbb{P}^r(\mathbb{R}) such that each qUq \in U admits an admissible decomposition with exactly that label and with minimal admissible rank. Notably, if X(R)X(\mathbb{R}) is Zariski dense in X(C)X(\mathbb{C}) and g=rgeng = r_{\mathrm{gen}} is the generic complex rank, every (a,b)(a,b) with $2a + b = g$ is typical. For specific curves, further structure arises; for example, linearly normal real elliptic curves of odd degree manifest typical labels with both generic and generic+1 weight values (Ballico et al., 2019).

Context Condition Typical Labels
Rational normal curves Degree dd 2a+b=(d+1)/22a+b=\lceil(d+1)/2\rceil
Generic projective var. Zariski-dense real part 2a+b=rgen2a+b = r_{\mathrm{gen}}
Real rank Only real points in support Labels of form (0,b)(0,b)

4. Labels for Rational Normal Curves

For the rational normal curve XdPdX_d \subset \mathbb{P}^d over R\mathbb{R}, the admissible rank equals the complex rank for all real points, i.e., rXd(C),adm(q)=rXd(C)(q)\mathrm{r}_{X_d(\mathbb{C}),\mathrm{adm}}(q) = \mathrm{r}_{X_d(\mathbb{C})}(q). All typical labels (a,b)(a, b) satisfy 2a+b=(d+1)/22a + b = \lceil (d+1)/2\rceil, and there are no labels of larger weight occurring generically (Ballico et al., 2019).

Table of typical labels for XdX_d:

dd (deg) Generic complex rank rgenr_{\mathrm{gen}} Typical labels (a,b)(a, b)
Odd d=2m+1d=2m+1 m+1m+1 All with $2a + b = m+1$
Even d=2md=2m m+1m+1 All with $2a + b = m+1$

The only extreme cases are (0,d+12)(0, \frac{d+1}{2}) and (d+12,0)(\frac{d+1}{2}, 0) for odd dd, representing decompositions entirely over the reals or entirely nonreal in conjugate pairs.

5. Scheme-Theoretic Labels and Cactus Rank Analogues

The scheme-theoretic (cactus) version of admissible rank considers 0-dimensional schemes ZX(C)Z \subset X(\mathbb{C}) fixed by conjugation, i.e., o(Z)=Zo(Z) = Z. The admissible cactus rank of qq is the minimal length of such a scheme whose span contains qq. Scheme-labels are richer, recording lengths of conjugation-stable connected components. In the reduced case, the scheme-label reduces to the pair (a,b)(a, b). For points of cactus-admissible rank below the linear independence threshold, the decomposition is unique and the scheme-label is well-defined (Ballico et al., 2019).

6. Comparison With Real Rank Theory

Real rank admits only real support in decompositions, with labels restricted to (0,b)(0, b). Admissible rank, in contrast, allows decompositions with complex conjugate summands, facilitating lower or more flexible bounds. For rational normal curves XdX_d, typical real ranks range from (d+1)/2\lceil(d+1)/2\rceil to dd, whereas all admissible labels are concentrated at (d+1)/2\lceil(d+1)/2\rceil. Admissible rank maintains much of the good generic uniqueness and bound properties of complex rank, yet tracks real versus complex phenomena via the integer aa in the label (Ballico et al., 2019).

7. “Rewards as Labels” (ReaL) in Reinforcement Learning

In a distinct context, the ReaL framework for RLVR (Reinforcement Learning with Verifiable Rewards) treats verifiable reward signals as categorical binary labels—a conceptual shift from scalar-reward policy gradients to classification-style updates. Each rollout oko_k is labeled yk=1y_k = 1 if the reward r(okq)=1r(o_k|q) = 1 (positive), and 0 otherwise (negative). The policy is optimized by minimizing a binary cross-entropy loss on “relative log-probability” scores sks_k, computed as

sk=1okt=1ok[logπθ(ok,tq,ok,<t)logπold(ok,tq,ok,<t)].s_k = \frac{1}{|o_k|} \sum_{t=1}^{|o_k|} \left[ \log \pi_\theta(o_{k,t} | q, o_{k,<t}) - \log \pi_{\text{old}}(o_{k,t} | q, o_{k,<t}) \right].

To enhance separation of positive and negative samples, a fixed anchor logit is introduced, leading to a softmax cross-entropy loss incorporating this anchor. The resulting loss yields monotonic and bounded gradient weighting, preventing hard negatives from dominating and properly prioritizing under-confident positives. Empirical results demonstrate gains in Pass@1 on mathematical reasoning benchmarks: at 1.5B parameters, ReaL outperforms DAPO by 6.7%; at 7B, ReaL surpasses DAPO and GSPO by 6.2% and 1.7%, respectively (Zhai et al., 5 Feb 2026).

Model Size GRPO DAPO GSPO ReaL (Pass@1 %)
1.5B 43.1 45.9 51.9 52.6
7B 59.2 57.0 61.5 63.2

Key properties include monotonic, bounded gradient assignment, empirical stability, and generalization across tasks. The classification formulation and anchor-logit mechanism distinguish ReaL from prior reward-weighted policy gradient methods (Zhai et al., 5 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reassessed Labels (ReaL).