Papers
Topics
Authors
Recent
Search
2000 character limit reached

Ray Regression Predictor: Convergence Insights

Updated 24 April 2026
  • Ray Regression Predictor (RRP) is a geometric and analytic framework that defines a unique affine ray detailing parameter convergence in logistic regression.
  • It partitions data into strongly convex and separable subsets to compute a maximum-margin direction and an optimal offset that control implicit bias.
  • RRP offers provable risk and parameter convergence rates, with directional convergence at O(ln ln t/ln t) and offset convergence at O((ln t)²/√t) under gradient descent.

The Ray Regression Predictor (RRP) is a geometric and analytic framework characterizing the asymptotic trajectory of parameter iterates when logistic regression is trained with first-order methods, particularly gradient descent, on arbitrary data. RRP formalizes the exact subspace, maximum-margin direction, and strongly convex offset controlling the implicit bias and parameter convergence rates under logistic (or exponential) risk minimization. The RRP is defined as a unique affine ray in parameter space determined by a data-dependent decomposition and possesses provable parameter and risk convergence properties that hold for general linearly separable and nonseparable regimes (Ji et al., 2018).

1. Geometric Structure and Definition

RRP arises from a structural decomposition of the design matrix AA, defined for nn labeled instances (xi,yi)(x_i, y_i) with ∥xiyi∥2≤1\|x_i y_i\|_2\leq 1, as Ai=−yixi⊤A_i = -y_i x_i^\top. The rows of AA are partitioned into:

  • Strongly Convex Part, ASA_S: corresponds to the maximal subset where the risk function RS\mathcal{R}_S is strongly convex.
  • Separable Part, AcA_c: consists of all rows AiA_i for which there exists nn0 with nn1 and nn2.

Define nn3 and nn4, yielding an orthogonal decomposition of parameter space. nn5 is proven to be linearly separable within nn6. The corresponding strict margin is

nn7

where nn8 is nn9 projected onto (xi,yi)(x_i, y_i)0. The unique maximum-margin separator within this subspace is

(xi,yi)(x_i, y_i)1

for any dual optimum (xi,yi)(x_i, y_i)2. On (xi,yi)(x_i, y_i)3, the unique strongly convex risk minimizer is

(xi,yi)(x_i, y_i)4

Ray Regression Predictor (RRP): The RRP is defined as

(xi,yi)(x_i, y_i)5

Gradient descent iterates (xi,yi)(x_i, y_i)6 satisfy, for large (xi,yi)(x_i, y_i)7,

(xi,yi)(x_i, y_i)8

thus tracking the RRP ray in direction and offset.

2. Risk Convergence Foundations

For empirical logistic or exponential risk

(xi,yi)(x_i, y_i)9

with ∥xiyi∥2≤1\|x_i y_i\|_2\leq 10 and gradient descent ∥xiyi∥2≤1\|x_i y_i\|_2\leq 11, risk convergence is established using:

  • Magic Smooth-Descent Lemma: For ∥xiyi∥2≤1\|x_i y_i\|_2\leq 12-smoothness, if ∥xiyi∥2≤1\|x_i y_i\|_2\leq 13, then for any ∥xiyi∥2≤1\|x_i y_i\|_2\leq 14,

∥xiyi∥2≤1\|x_i y_i\|_2\leq 15

  • Fixed-Direction Rate Lemma: Setting ∥xiyi∥2≤1\|x_i y_i\|_2\leq 16,

∥xiyi∥2≤1\|x_i y_i\|_2\leq 17

Combining these, risk convergence for ∥xiyi∥2≤1\|x_i y_i\|_2\leq 18 yields

∥xiyi∥2≤1\|x_i y_i\|_2\leq 19

3. Parameter Convergence Theorems

3.1 Offset Convergence on Ai=−yixi⊤A_i = -y_i x_i^\top0

For the Ai=−yixi⊤A_i = -y_i x_i^\top1-component, let Ai=−yixi⊤A_i = -y_i x_i^\top2 be the modulus of strong convexity of Ai=−yixi⊤A_i = -y_i x_i^\top3. It follows that for Ai=−yixi⊤A_i = -y_i x_i^\top4 and arbitrary step-size sequence,

Ai=−yixi⊤A_i = -y_i x_i^\top5

For Ai=−yixi⊤A_i = -y_i x_i^\top6, this yields the offset-convergence theorem:

Ai=−yixi⊤A_i = -y_i x_i^\top7

3.2 Directional Convergence on Ai=−yixi⊤A_i = -y_i x_i^\top8

For the Ai=−yixi⊤A_i = -y_i x_i^\top9 component, norm growth is established: AA0, with AA1 and AA2. Using a Fenchel–Young argument, for fully separable AA3 (i.e., AA4) and AA5,

AA6

i.e.,

AA7

4. Practical Construction of the RRP

The construction of the Ray Regression Predictor in practice may be summarized by the following workflow:

  • Separable Subset Identification: Employ a greedy separability test on each example to construct the separable subset AA8, corresponding to the partition AA9.
  • Offset Computation: Compute ASA_S0 using any standard solver for convex minimization.
  • Maximum-Margin Direction: Compute

ASA_S1

for the maximum-margin separator within ASA_S2.

  • Final RRP: The RRP is then the ray ASA_S3. Gradient descent (with constant or decaying steps) on the empirical risk automatically yields iterates tracking the RRP: the offset ASA_S4 is recovered first at rate ASA_S5, with directional convergence to ASA_S6 at rate ASA_S7.

5. Summary of Key Rates and Theoretical Guarantees

The table summarizes the principal rates for risk and parameter convergence:

Quantity Convergence Rate Conditions
ASA_S8 ASA_S9 RS\mathcal{R}_S0
RS\mathcal{R}_S1 RS\mathcal{R}_S2 Strongly convex RS\mathcal{R}_S3
RS\mathcal{R}_S4 RS\mathcal{R}_S5 Fully separable RS\mathcal{R}_S6
RS\mathcal{R}_S7 RS\mathcal{R}_S8 Asymptotic, under above

These results hold for gradient descent initialization at RS\mathcal{R}_S9, with either constant or inverse-root step size as specified, and for the empirical logistic or exponential loss. The RRP fully explicates the implicit bias and convergence path of iterates in high-dimensional, possibly partially or fully separable logistic regression tasks (Ji et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ray Regression Predictor (RRP).