Ray Regression Predictor: Convergence Insights

Updated 24 April 2026

Ray Regression Predictor (RRP) is a geometric and analytic framework that defines a unique affine ray detailing parameter convergence in logistic regression.
It partitions data into strongly convex and separable subsets to compute a maximum-margin direction and an optimal offset that control implicit bias.
RRP offers provable risk and parameter convergence rates, with directional convergence at O(ln ln t/ln t) and offset convergence at O((ln t)²/√t) under gradient descent.

The Ray Regression Predictor (RRP) is a geometric and analytic framework characterizing the asymptotic trajectory of parameter iterates when logistic regression is trained with first-order methods, particularly gradient descent, on arbitrary data. RRP formalizes the exact subspace, maximum-margin direction, and strongly convex offset controlling the implicit bias and parameter convergence rates under logistic (or exponential) risk minimization. The RRP is defined as a unique affine ray in parameter space determined by a data-dependent decomposition and possesses provable parameter and risk convergence properties that hold for general linearly separable and nonseparable regimes (Ji et al., 2018).

1. Geometric Structure and Definition

RRP arises from a structural decomposition of the design matrix $A$ , defined for $n$ labeled instances $(x_i, y_i)$ with $\|x_i y_i\|_2\leq 1$ , as $A_i = -y_i x_i^\top$ . The rows of $A$ are partitioned into:

Strongly Convex Part, $A_S$ : corresponds to the maximal subset where the risk function $\mathcal{R}_S$ is strongly convex.
Separable Part, $A_c$ : consists of all rows $A_i$ for which there exists $n$ 0 with $n$ 1 and $n$ 2.

Define $n$ 3 and $n$ 4, yielding an orthogonal decomposition of parameter space. $n$ 5 is proven to be linearly separable within $n$ 6. The corresponding strict margin is

$n$ 7

where $n$ 8 is $n$ 9 projected onto $(x_i, y_i)$ 0. The unique maximum-margin separator within this subspace is

$(x_i, y_i)$ 1

for any dual optimum $(x_i, y_i)$ 2. On $(x_i, y_i)$ 3, the unique strongly convex risk minimizer is

$(x_i, y_i)$ 4

Ray Regression Predictor (RRP): The RRP is defined as

$(x_i, y_i)$ 5

Gradient descent iterates $(x_i, y_i)$ 6 satisfy, for large $(x_i, y_i)$ 7,

$(x_i, y_i)$ 8

thus tracking the RRP ray in direction and offset.

2. Risk Convergence Foundations

For empirical logistic or exponential risk

$(x_i, y_i)$ 9

with $\|x_i y_i\|_2\leq 1$ 0 and gradient descent $\|x_i y_i\|_2\leq 1$ 1, risk convergence is established using:

Magic Smooth-Descent Lemma: For $\|x_i y_i\|_2\leq 1$ 2-smoothness, if $\|x_i y_i\|_2\leq 1$ 3, then for any $\|x_i y_i\|_2\leq 1$ 4,

$\|x_i y_i\|_2\leq 1$ 5

Fixed-Direction Rate Lemma: Setting $\|x_i y_i\|_2\leq 1$ 6,

$\|x_i y_i\|_2\leq 1$ 7

Combining these, risk convergence for $\|x_i y_i\|_2\leq 1$ 8 yields

$\|x_i y_i\|_2\leq 1$ 9

3. Parameter Convergence Theorems

3.1 Offset Convergence on $A_i = -y_i x_i^\top$ 0

For the $A_i = -y_i x_i^\top$ 1-component, let $A_i = -y_i x_i^\top$ 2 be the modulus of strong convexity of $A_i = -y_i x_i^\top$ 3. It follows that for $A_i = -y_i x_i^\top$ 4 and arbitrary step-size sequence,

$A_i = -y_i x_i^\top$ 5

For $A_i = -y_i x_i^\top$ 6, this yields the offset-convergence theorem:

$A_i = -y_i x_i^\top$ 7

3.2 Directional Convergence on $A_i = -y_i x_i^\top$ 8

For the $A_i = -y_i x_i^\top$ 9 component, norm growth is established: $A$ 0, with $A$ 1 and $A$ 2. Using a Fenchel–Young argument, for fully separable $A$ 3 (i.e., $A$ 4) and $A$ 5,

$A$ 6

i.e.,

$A$ 7

4. Practical Construction of the RRP

The construction of the Ray Regression Predictor in practice may be summarized by the following workflow:

Separable Subset Identification: Employ a greedy separability test on each example to construct the separable subset $A$ 8, corresponding to the partition $A$ 9.
Offset Computation: Compute $A_S$ 0 using any standard solver for convex minimization.
Maximum-Margin Direction: Compute

$A_S$ 1

for the maximum-margin separator within $A_S$ 2.

Final RRP: The RRP is then the ray $A_S$ 3. Gradient descent (with constant or decaying steps) on the empirical risk automatically yields iterates tracking the RRP: the offset $A_S$ 4 is recovered first at rate $A_S$ 5, with directional convergence to $A_S$ 6 at rate $A_S$ 7.

5. Summary of Key Rates and Theoretical Guarantees

The table summarizes the principal rates for risk and parameter convergence:

Quantity	Convergence Rate	Conditions
$A_S$ 8	$A_S$ 9	$\mathcal{R}_S$ 0
$\mathcal{R}_S$ 1	$\mathcal{R}_S$ 2	Strongly convex $\mathcal{R}_S$ 3
$\mathcal{R}_S$ 4	$\mathcal{R}_S$ 5	Fully separable $\mathcal{R}_S$ 6
$\mathcal{R}_S$ 7	$\mathcal{R}_S$ 8	Asymptotic, under above

These results hold for gradient descent initialization at $\mathcal{R}_S$ 9, with either constant or inverse-root step size as specified, and for the empirical logistic or exponential loss. The RRP fully explicates the implicit bias and convergence path of iterates in high-dimensional, possibly partially or fully separable logistic regression tasks (Ji et al., 2018).

Markdown Report Issue Upgrade to Chat

References (1)

Risk and parameter convergence of logistic regression (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ray Regression Predictor (RRP).

Ray Regression Predictor: Convergence Insights

1. Geometric Structure and Definition

2. Risk Convergence Foundations

3. Parameter Convergence Theorems

3.1 Offset Convergence on $A_i = -y_i x_i^\top$ 0

3.2 Directional Convergence on $A_i = -y_i x_i^\top$ 8

4. Practical Construction of the RRP

5. Summary of Key Rates and Theoretical Guarantees

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Ray Regression Predictor: Convergence Insights

1. Geometric Structure and Definition

2. Risk Convergence Foundations

3. Parameter Convergence Theorems

3.1 Offset Convergence on Ai=−yixi⊤A_i = -y_i x_i^\topAi​=−yi​xi⊤​0

3.2 Directional Convergence on Ai=−yixi⊤A_i = -y_i x_i^\topAi​=−yi​xi⊤​8

4. Practical Construction of the RRP

5. Summary of Key Rates and Theoretical Guarantees

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

3.1 Offset Convergence on $A_i = -y_i x_i^\top$ 0

3.2 Directional Convergence on $A_i = -y_i x_i^\top$ 8