Papers
Topics
Authors
Recent
Search
2000 character limit reached

Soft-Absolute Hessian Metric

Updated 15 November 2025
  • Soft-Absolute Hessian Metric is a framework that replaces non-differentiable absolute functions with smooth surrogates to regulate second-order derivatives in optimization.
  • It leverages a Riemannian manifold structure and a metric-DNN to compute the divergence of policy gradients, ensuring efficient and differentiable backpropagation.
  • Empirical evaluations demonstrate significant reductions in gradient divergence and improved performance in reinforcement learning and imaging applications.

A "Soft-Absolute Hessian Metric" refers to a regularization and similarity measure framework rooted in controlling or exploiting Hessian-derived quantities—specifically the trace/divergence of a Hessian in a geometric manifold setting—where the non-differentiable absolute value operation is replaced by a smooth surrogate. This construction enables stable optimization and gradient-based learning, particularly in reinforcement learning policy optimization and, analogously, image similarity modeling. Extant research particularly details its role in policy gradient methods for deep reinforcement learning, where it is used to regularize the divergence of the policy gradient vector field lifted to a Riemannian manifold via a learned metric tensor, utilizing a smooth absolute surrogate as the core of the regularizer (Chen et al., 2023). In the context of image similarity, Hessian-based metrics are also constructed to be everywhere differentiable, though the term “soft-absolute” is not used explicitly (Eskandari et al., 2023).

1. Theoretical Motivation and Geometric Framework

The design of the soft-absolute Hessian metric arises from the need to control higher-order (second derivative) properties in parameterized models. In the policy gradient context, the Euclidean parameter space θRn\theta \in \mathbb{R}^n is promoted to a Riemannian manifold with a metric tensor gab(θ)g_{ab}(\theta) parameterized as

gab(θ)=δab+ua(θ)ub(θ),g_{ab}(\theta) = \delta_{ab} + u_a(\theta)u_b(\theta),

with u(θ)Rnu(\theta) \in \mathbb{R}^n output by a compact neural network (referred to as a "metric-DNN"). The inverse metric is available in closed form via the Sherman–Morrison formula, enabling efficient computation. This geometric lifting enables computation of the divergence of the policy gradient vector field Ja=gabbL(θ)J^a = g^{ab}\partial_b L(\theta) with respect to the manifold as

divgJ=aJa=1detGθa(detGθJa),\mathrm{div}_g J = \nabla_a J^a = \frac{1}{\sqrt{\det G_\theta}} \partial_a (\sqrt{\det G_\theta} J^a),

where L(θ)L(\theta) is the policy objective. Explicit use of the manifold structure is key to enabling novel forms of regularization and optimization via higher-order derivatives.

2. Soft-Absolute Regularization: Definition and Mathematical Formulation

A central regularizer is the expectation of the absolute divergence of the policy gradient field, cast as

R(θ)=Eτπθ[divgθlogπθ(τ)],R(\theta) = \mathbb{E}_{\tau \sim \pi_\theta} \left[ \left| \mathrm{div}_g \nabla_\theta \log \pi_\theta(\tau) \right| \right],

where τ\tau denotes a trajectory sampled from policy πθ\pi_\theta. The absolute value, being non-differentiable at zero, is replaced in practice by a smooth function S(x)S(x), typically the softplus S(x)=softplus(αx)S(x) = \mathrm{softplus}(\alpha x) or S(x)=x2+ϵS(x) = \sqrt{x^2 + \epsilon}, giving a differentiable surrogate: R(θ)Eτ[S(divgJ(θ;τ))].R(\theta) \approx \mathbb{E}_\tau \left[ S\left(\mathrm{div}_g J(\theta; \tau)\right) \right]. This construction ensures that automatic differentiation can be applied end-to-end for optimizing both policy and metric parameters. The gradient of the regularizer is

θR=Eτ[S(divgJ)θ(divgJ)],\nabla_\theta R = \mathbb{E}_\tau \left[ S'\left(\mathrm{div}_g J\right) \nabla_\theta \left(\mathrm{div}_g J\right) \right],

propagating through all dependencies, including the metric DNN u(θ)u(\theta).

3. Algorithmic Integration in Policy Gradient Methods

The soft-absolute Hessian regularizer is incorporated in the objective for policy gradient optimization: J(θ)=Eτπθ[t=0T1logπθ(atst)G(τ)]λR(θ),J(\theta) = \mathbb{E}_{\tau \sim \pi_\theta} \left[\sum_{t=0}^{T-1} \log \pi_\theta(a_t|s_t) G(\tau)\right] - \lambda R(\theta), leading to a gradient update of the form: θJ(θ)=Eτ[tθlogπθ(atst)G(τ)]λθR(θ).\nabla_\theta J(\theta) = \mathbb{E}_\tau \left[ \sum_t \nabla_\theta \log \pi_\theta(a_t | s_t) G(\tau) \right] - \lambda \nabla_\theta R(\theta). Standard actor-critic or REINFORCE implementations can be modified by adding the R(θ)R(\theta) term to the loss function and using automatic differentiation to propagate through both the policy and metric networks. The metric network is trained to minimize the squared divergence to actively drive the divergence term toward zero.

A representative pseudocode block from (Chen et al., 2023):

1
2
3
4
5
6
7
while not converged:
    # Collect trajectory batch under π_θ
    # Compute policy loss (L_pg)
    # Compute divergence D_i = div_g[∇_θ ln π_θ(τ_i)]
    # Compute regularizer R using soft-absolute surrogate S(D_i)
    θ  θ  α_θ _θ (L_pg + λ·R)
    φ  φ  α_φ _φ (mean D_i²)  # Train metric DNN to minimize divergence

4. Empirical Evaluation and Quantitative Impact

In experiments using benchmark continuous control environments (Hopper-v3, Walker2D-v3, LunarLanderContinuous-v2, and PyBullet variants), the soft-absolute Hessian regularizer (integrated into SAC and TD3) yields the following outcomes:

  • Regularized variants reduce divgJ|\mathrm{div}_g J| by 60–80%.
  • Net gains in final episodic return are 10–50% compared to unregularized policy gradient.
  • Hyperparameters typical for the regularizer are λ{102,101}\lambda \in \{10^{-2}, 10^{-1}\}, ϵ=103\epsilon = 10^{-3}, and softplus α=10\alpha = 10.

Ablations indicate the metric loss (training the metric-DNN) and the regularizer are both essential for the observed reduction in divergence and the associated performance gains.

5. Implementation Considerations and Practical Guidance

The metric tensor gab(θ)g_{ab}(\theta) is constructed via u(θ)u(\theta), with u(θ)u(\theta) formed as S(θ;ϕ1)R(θ;ϕ2)θS(\theta; \phi_1) R(\theta; \phi_2)\theta from small Fourier-based sub-networks, making the computation lightweight. All requisite formulas (inverse metric, divergence, and their gradients) admit closed forms with respect to u(θ)u(\theta) and can be implemented efficiently in any autodiff-compatible machine learning framework. The soft-absolute surrogate ensures all operations are smooth and differentiable, safeguarding against nondifferentiability and facilitating stable optimization. The regularizer is compatible with both vanilla and off-policy actor-critic methods; only the loss function and metric-DNN components need to be integrated.

Pseudocode for a single actor-critic update cycle is provided in (Chen et al., 2023). The critical elements are:

  • Sampling batches of trajectories for on-policy losses and divergence terms.
  • Backpropagation through the combined policy and regularization loss.
  • Parallel updates to policy parameters and metric DNN weights.

6. Relation to Other Hessian-Based Metrics

No explicit “soft-absolute” Hessian metric is discussed in the domain of multimodal medical image registration (Eskandari et al., 2023). The Hessian-based similarity metric proposed therein is closed-form, smooth, and differentiable but replaces any hard absolute values with sums of squares, yielding an everywhere differentiable objective. Its design leverages vectorized Hessian projections into appropriate subspaces, interpreted geometrically via cosine similarity, and achieves state-of-the-art performance on MR–US registration with strong robustness to intensity bias fields.

This suggests that the concept of “soft-absolute” metrics is broadly instantiated as making use of second-derivative-based quantities in a manner that maintains differentiability (usually via quadratic or softplus-like surrogates), whether for regularization in learning algorithms or as similarity measures in image analysis.

7. Significance and Broader Implications

The soft-absolute Hessian-based regularization framework demonstrates that explicit, smooth control over second-order geometric properties in parameterized models yields significant improvements in both stability and performance—evidenced in reinforcement learning as well as imaging contexts. By leveraging closed-form, differentiable surrogates for otherwise non-smooth objectives, such approaches enable more robust optimization, better exploitation of higher-order information, and integration with modern autodiff tooling. The method generalizes beyond Euclidean settings, unifying geometric regularization and policy optimization within a principled manifold-learning perspective.

A plausible implication is that similar soft-absolute Hessian metrics could be extended to other contexts where the absolute value of higher-order derivatives encodes critical regularity or similarity structure, provided that differentiability and computational tractability are preserved.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Soft-Absolute Hessian Metric.