Expected Gradient Outer Product (EGOP)

Updated 28 January 2026

Expected Gradient Outer Product (EGOP) is a matrix quantifying average squared directional derivatives, highlighting the principal directions along which a function varies.
Estimation techniques such as finite differences, local regression, and surrogate modeling enable consistent and efficient computation of EGOP in high-dimensional data.
EGOP underpins applications in metric learning, optimization reparameterization, and neural feature extraction, with both theoretical guarantees and empirical performance improvements.

The Expected Gradient Outer Product (EGOP) is a central object in modern dimension reduction, adaptive optimization, and feature learning. It quantifies the average squared directional derivative of a function, encoding in a single positive semidefinite matrix the principal input directions along which a target function varies. EGOP and its generalizations, such as the Expected Jacobian Outer Product (EJOP) for vector- or multiclass-valued functions, underpin a range of methodologies in sufficient dimension reduction, data preconditioning, metric learning, kernel adaption, and analysis of neural feature learning. This article systematically presents the mathematical foundation, estimation strategies, theoretical properties, and algorithmic applications of EGOP, highlighting key results and current lines of research.

1. Mathematical Definition and Core Properties

Let $f : \mathbb{R}^d \to \mathbb{R}$ be a differentiable function and $\rho$ a measure on $\mathbb{R}^d$ (often the data or parameter distribution). The Expected Gradient Outer Product is the $d \times d$ positive semidefinite matrix

$\text{EGOP}(f) = \mathbb{E}_{x \sim \rho} [\nabla f(x) \nabla f(x)^\top].$

For vector-valued outputs $f: \mathbb{R}^d \to \mathbb{R}^C$ with Jacobian $J_f(x) \in \mathbb{R}^{d \times C}$ , the generalization is the Expected Jacobian Outer Product (EJOP),

$\text{EJOP}(f) = \mathbb{E}_{x \sim \rho} [J_f(x) J_f(x)^\top] = \sum_{c=1}^C \mathbb{E}_{x\sim\rho}[\nabla f_c(x) \nabla f_c(x)^\top].$

For any direction $v \in \mathbb{R}^d$ , $v^\top \text{EGOP}(f) v = \mathbb{E}[(v^\top \nabla f(x))^2]$ gives the average squared directional derivative, making the top eigenvectors of EGOP the axes of largest functional variation (Rauniyar, 9 Dec 2025, Trivedi et al., 2020, DePavia et al., 3 Feb 2025).

In multi-index regression, if $f(x) = g(Bx)$ for $B \in \mathbb{R}^{s \times d}$ , then

$\text{EGOP}(f) = B^\top \mathbb{E}[\nabla g(Bx)\nabla g(Bx)^\top] B,$

giving $\operatorname{rank}(\text{EGOP}(f)) \leq s$ . Thus, EGOP recovers the relevant subspace $S = \operatorname{row}(B)$ through its leading eigenvectors (Trivedi et al., 2020, Baptista et al., 2024).

2. Estimation Techniques

Finite Difference and Local Regression

The canonical estimator of EGOP employs finite-difference approximations or local (kernel) polynomial fits to estimate gradients at a sample of locations $\{x_i\}$ , forming empirical

$\widehat{\text{EGOP}} = \frac{1}{n} \sum_{i=1}^n \hat\nabla f(x_i) \hat\nabla f(x_i)^\top,$

with local linear regression or kernel smoothing producing consistent gradient estimates. For vector-valued $f$ , the per-class gradients are assembled into a Jacobian $\hat J_f(x_i)$ , yielding

$\widehat{\text{EJOP}} = \frac{1}{n} \sum_{i=1}^n \hat J_f(x_i) \hat J_f(x_i)^\top$

(Trivedi et al., 2020, Baptista et al., 2024, Rauniyar, 9 Dec 2025).

Surrogate Modeling

When $f$ is unknown up to noisy samples $(x_i, y_i)$ , a smooth surrogate $\hat f$ is fit (e.g., random forest, kernel smoother, neural network), and finite differences are computed with respect to each coordinate and output class. This approach is effective in both regression and classification, provided the surrogate converges to the population $f$ .

Compressive Sensing for Sparse Gradients

In high-dimensional settings with sparse gradients, EGOP estimation can be dramatically accelerated by simultaneous perturbation and $\ell_1$ -minimization: only $O(s \log (d/s))$ random linear probes are required per location for $s$ -sparse gradients (Borkar et al., 2015). Stacking the recovered gradients yields an accurate EGOP estimator at reduced cost.

Smoothed and Weighted Estimation

For dimension reduction in nonparametric settings, smoothed gradient estimation via weighted local linear regression (using Gaussian or kernel weights) supports parametric $O(n^{-1/2})$ convergence rates of subspace estimation with favorable dimension dependence, even under heavy-tailed or non-Gaussian covariate distributions (Yuan et al., 2023).

Algorithmic Outline

Method	Core Step	Sample Complexity
Finite Differences	Local differences/smoothing	$O(nd)$
Surrogate Model Gradients	Model fit + finite differs	Depends on model, $n$
Simultaneous Perturbation + $\ell_1$	$O(s\log(d/s))$ probes, $\ell_1$ -solve	Per sample: $O(s\log(d/s))$
Weighted Local Regression	Importance-weighted local lin	$O(n)$

3. Theoretical Guarantees and Spectral Characterization

Consistency and Convergence

Under classical regularity (bounded higher-order derivatives, noise control), empirical EGOP estimators converge in operator or Frobenius norm at rates $O(1/\sqrt{n})$ (possibly with mild logarithmic factors), and their eigenvalues/eigenvectors enjoy Weyl/Davis–Kahan type perturbation bounds (Trivedi et al., 2020, Yuan et al., 2023, Borkar et al., 2015). For weighted or smoothed estimators, rates are preserved with care in bandwidth choice and weighting (Yuan et al., 2023).

In ridge structure/multi-index regression with $f(x)=g(Bx)$ , EGOP is low-rank and its leading eigenvectors recover the central mean subspace. This underpins the application of EGOP as a sufficient dimension reduction tool.

Spectral Decay and Subspace Recovery

The spectral properties of EGOP drive its effectiveness in both optimization and dimension reduction. When the spectrum decays rapidly (low stable rank), reparameterizing or projecting onto the leading eigenvectors concentrates the relevant variation, accelerates first-order optimization (e.g., Adagrad, Adam), and yields efficient low-dimensional regression (DePavia et al., 3 Feb 2025, Baptista et al., 2024).

In high dimensions, Gaussian smoothing and probe splitting enable near-parametric subspace recovery with dimension constants $C_d = O(d^r)$ for polynomials of degree $r$ and Gaussian design (Yuan et al., 2023).

4. Applications in Learning and Optimization

Preconditioning Decision Trees and Random Forests

The empirical EGOP (or more generally, EJOP) provides a data-driven global linear preconditioner for axis-aligned tree ensembles (e.g., JARF): by rotating the data using the principal components of EGOP, axis-aligned splits in the transformed coordinates implement oblique splits maximizing impurity gain $u^\top H_0 u$ , efficiently capturing interaction effects without the computational burden of oblique forests (Rauniyar, 9 Dec 2025).

Adaptive Optimization (EGOP Reparameterization)

EGOP-based orthonormal reparameterization of adaptive optimizers aligns parameter updates with descent directions of greatest expected functional variation, accelerating methods such as Adagrad and Adam when the EGOP spectrum decays. The analysis quantifies convergence speedups proportional to the ratio of stable rank to ambient dimension, confirmed empirically across convex and nonconvex deep learning problems (DePavia et al., 3 Feb 2025).

Kernel Smoothing and Intrinsic Dimension Learning

In adaptive kernel regression, local EGOPs define Mahalanobis metrics aligning smoothing neighborhoods with the function's intrinsic variability, yielding minimax rates that depend on the function's local intrinsic dimension rather than the ambient dimension. The Local EGOP learning algorithm recursively adapts smoothing metrics to local function geometry, achieving $O(n^{-4/(2d+5)})$ rates in noisy manifold settings and outperforming multilayer networks in continuous-index tasks (Kokot et al., 11 Jan 2026).

Sufficient Dimension Reduction

EGOP/OPG-based estimators, including mean- and mode-based variants, are widely used to recover the central mean subspace in multi-index models. The modal version (LMOPG) corrects for situations where mean-based gradients miss central directions, attaining consistency and asymptotic normality even under heavy-tailed or skewed errors (Li et al., 2024).

Feature Learning in Neural and Non-Neural Models

EGOP (or its empirical variant AGOP) has emerged as a key mechanism for feature learning in kernel machines, non-neural recursive feature machines (RFM), and deep learning. AGOP-guided updates generate task-relevant features, explain emergence phenomena such as "grokking" in non-neural models, and provide a unified account of deep neural collapse by aligning layerwise Grammians with AGOP subspaces (Beaglehole et al., 2024, Mallinar et al., 2024).

5. Advanced Topics and Extensions

Multiclass and Structured Outputs: EJOP

The Expected Jacobian Outer Product (EJOP) generalizes EGOP to vector-valued or multiclass settings, stacking per-class gradients and summing their outer products. EJOP estimators support consistent metric and subspace recovery for nonparametric classification and kernel-based metric learning, providing initialization for full metric learning algorithms (Trivedi et al., 2020, Rauniyar, 9 Dec 2025).

Algorithmic Structures: Recursive and Iterative Use

Iterative algorithms such as Recursive Feature Machines and Deep RFM employ the empirical EGOP/AGOP at each iteration to define the next layer's data embedding, recursively denoising and concentrating information in low-rank principal subspaces. In these models, the projection with AGOP matrices is solely responsible for phenomena such as deep neural collapse—random features alone cannot induce such collapse (Beaglehole et al., 2024, Mallinar et al., 2024).

Compression and Sample-Efficient Estimation

The compressive-sensing–simultaneous-perturbation methodology for EGOP estimation is effective when gradients are sparse: it achieves linear scaling in the sparsity level and logarithmic in ambient dimension, controlled by the number of probes and $\ell_1$ error bounds (Borkar et al., 2015).

6. Empirical Validation and Benchmarks

EGOP-powered approaches have been extensively validated:

Mondrian-forest-based EGOP estimators achieve consistent subspace recovery and accelerate high-dimensional regression, approaching oracle performance (Baptista et al., 2024).
EGOP metrics outperform Euclidean and conventional metrics in nearest-neighbor classification across real-world datasets, and closely match specialized metric-learning methods (Trivedi et al., 2020).
Local EGOP learning recovers intrinsic dimension and achieves near-optimal rates in synthetic and molecular dynamics benchmarks, outperforming deep neural nets in continuous-index tasks (Kokot et al., 11 Jan 2026).
Deep RFM and its AGOP projections induce neural collapse and explain the geometry of trained DNN feature spaces quantitatively (Beaglehole et al., 2024).
In optimization, EGOP-based coordinate changes accelerate Adagrad/Adam by factors of 2–5 in empirical studies (DePavia et al., 3 Feb 2025).

7. Limitations, Extensions, and Open Directions

While EGOP-based methods provide powerful, theory-backed tools for structured learning and dimension reduction, open areas remain:

Online and blockwise EGOP estimation for scalability in large models (DePavia et al., 3 Feb 2025).
Generalizations beyond the mean or mode regression function to robust or conditional quantile-based versions (Li et al., 2024).
Extensions to semi-supervised, multi-view, or structured-output tasks (Trivedi et al., 2020).
Theoretical analysis of EGOP in overparameterized and highly nonconvex regimes, including deep learning with architectural biases (Beaglehole et al., 2024, Mallinar et al., 2024).
Empirically, full eigendecomposition is costly in high dimensions; fast randomized or low-rank approximations are important for practical deployment (DePavia et al., 3 Feb 2025).

A plausible implication is that as models and data scale further, EGOP/EJOP-based analyses will remain pivotal in understanding and exploiting structure for learning, feature compression, and optimization. Ongoing research investigates streaming, online updating, deep networks with modular blocks, and principled feature learning through the lens of EGOP statistics.

Principal Representative Papers:

Multiclass generalization, consistency, and metric learning: "The Expected Jacobian Outerproduct: Theory and Empirics" (Trivedi et al., 2020)
Tree ensemble preconditioning: "Jacobian Aligned Random Forests" (Rauniyar, 9 Dec 2025)
Adaptive kernel smoothing and local learning: "Local EGOP for Continuous Index Learning" (Kokot et al., 11 Jan 2026)
Fast parametric subspace estimation: "Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products" (Yuan et al., 2023)
High-dimensional regression via Mondrian forests: "TrIM: Transformed Iterative Mondrian Forests" (Baptista et al., 2024)
Adaptive optimization reparameterization: "Faster Adaptive Optimization via Expected Gradient Outer Product Reparameterization" (DePavia et al., 3 Feb 2025)
High-dimensional gradient estimation: "Gradient Estimation with Simultaneous Perturbation and Compressive Sensing" (Borkar et al., 2015)
Deep neural feature collapse: "Average gradient outer product as a mechanism for deep neural collapse" (Beaglehole et al., 2024)
"Grokking" and emergence phenomena: "Emergence in non-neural models: grokking modular arithmetic via average gradient outer product" (Mallinar et al., 2024)
Mode-based dimension reduction: "A Local Modal Outer-Product-Gradient Estimator for Dimension Reduction" (Li et al., 2024)

Markdown Report Issue Upgrade to Chat

References (10)

Jacobian Aligned Random Forests (2025)

The Expected Jacobian Outerproduct: Theory and Empirics (2020)

Faster Adaptive Optimization via Expected Gradient Outer Product Reparameterization (2025)

TrIM: Transformed Iterative Mondrian Forests for Gradient-based Dimension Reduction and High-Dimensional Regression (2024)

Gradient Estimation with Simultaneous Perturbation and Compressive Sensing (2015)

Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products (2023)

Local EGOP for Continuous Index Learning (2026)

A Local Modal Outer-Product-Gradient Estimator for Dimension Reduction (2024)

Average gradient outer product as a mechanism for deep neural collapse (2024)

10.

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Expected Gradient Outer Product (EGOP).

Expected Gradient Outer Product (EGOP)

1. Mathematical Definition and Core Properties

2. Estimation Techniques

Finite Difference and Local Regression

Surrogate Modeling

Compressive Sensing for Sparse Gradients

Smoothed and Weighted Estimation

Algorithmic Outline

3. Theoretical Guarantees and Spectral Characterization

Consistency and Convergence

Spectral Decay and Subspace Recovery

4. Applications in Learning and Optimization

Preconditioning Decision Trees and Random Forests

Adaptive Optimization (EGOP Reparameterization)

Kernel Smoothing and Intrinsic Dimension Learning

Sufficient Dimension Reduction

Feature Learning in Neural and Non-Neural Models

5. Advanced Topics and Extensions

Multiclass and Structured Outputs: EJOP

Algorithmic Structures: Recursive and Iterative Use

Compression and Sample-Efficient Estimation

6. Empirical Validation and Benchmarks

7. Limitations, Extensions, and Open Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Expected Gradient Outer Product (EGOP)

1. Mathematical Definition and Core Properties

2. Estimation Techniques

Finite Difference and Local Regression

Surrogate Modeling

Compressive Sensing for Sparse Gradients

Smoothed and Weighted Estimation

Algorithmic Outline

3. Theoretical Guarantees and Spectral Characterization

Consistency and Convergence

Spectral Decay and Subspace Recovery

4. Applications in Learning and Optimization

Preconditioning Decision Trees and Random Forests

Adaptive Optimization (EGOP Reparameterization)

Kernel Smoothing and Intrinsic Dimension Learning

Sufficient Dimension Reduction

Feature Learning in Neural and Non-Neural Models

5. Advanced Topics and Extensions

Multiclass and Structured Outputs: EJOP

Algorithmic Structures: Recursive and Iterative Use

Compression and Sample-Efficient Estimation

6. Empirical Validation and Benchmarks

7. Limitations, Extensions, and Open Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research