Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hybrid Quasi-Newton Backpropagation

Updated 29 March 2026
  • The paper demonstrates that hybrid quasi-Newton backpropagation improves MLP training by integrating BFGS updates to effectively approximate second-order curvature.
  • It incorporates trust-region methods and Wolfe condition-based line searches to ensure robust step size selection and faster convergence compared to gradient descent.
  • Empirical results indicate lower training/test MSE and reduced convergence times, highlighting the method’s practical benefits over standard backpropagation.

Hybrid Quasi-Newton Backpropagation is a supervised learning algorithm for training multi-layer perceptrons (MLPs) that integrates quasi-Newton optimization—specifically BFGS matrix updates, trust-region methods, and Wolfe condition–based line search—within the backpropagation framework. It is designed to address shortcomings of standard gradient-descent backpropagation, such as poor error-weight objective function optimization, slow learning rates, and general instability by leveraging second-order information to improve convergence properties and robustness (Chakraborty et al., 2012).

1. Problem Formulation and Error Objective

Let WW denote the vector of all adjustable weights, including biases, in an MLP with oo outputs, hh hidden neurons, and nn inputs. Given a supervised training set {(xp,Tp)}p=1P\{(x^p, T^p)\}_{p=1}^P with xpRnx^p \in \mathbb{R}^n and TpRoT^p \in \mathbb{R}^o, network predictions are Op(W)O^p(W). The learning objective is to minimize the mean-square error (MSE): E(W)=12Pp=1POp(W)Tp2,E(W) = \frac{1}{2P} \sum_{p=1}^P \|O^p(W) - T^p\|^2, yielding the minimization problem W=argminWE(W)W^* = \arg\min_W E(W). The hybrid algorithm uses a quadratic model: mk(s)=E(Wk)+gkTs+12sTBks,m_k(s) = E(W_k) + g_k^T s + \frac{1}{2} s^T B_k s, where gk=E(Wk)g_k = \nabla E(W_k) and BkB_k is a positive-definite approximation to the Hessian.

2. Quasi-Newton Updates and BFGS Formula

At the core of this method is the Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi-Newton update. Beginning with B0=IB_0 = I, the weight update steps sk=Wk+1Wks_k = W_{k+1} - W_k and gradient differences yk=gk+1gky_k = g_{k+1} - g_k yield: Bk+1=Bk+ykykTykTskBkskskTBkskTBksk.B_{k+1} = B_k + \frac{y_k y_k^T}{y_k^T s_k} - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k}. An equivalent recursion maintains the inverse Hessian approximation HkBk1H_k \approx B_k^{-1}: Hk+1=(IρkskykT)Hk(IρkykskT)+ρkskskT,H_{k+1} = (I - \rho_k s_k y_k^T) H_k (I - \rho_k y_k s_k^T) + \rho_k s_k s_k^T, where ρk=1/(ykTsk)\rho_k = 1 / (y_k^T s_k). These updates efficiently approximate local curvature, improving search directions and circumventing explicit Hessian computation.

3. Trust-Region and Line Search Mechanisms

The optimization step either restricts candidate steps ss to a trust region (sΔk\|s\| \leq \Delta_k) or seeks a step αkpk\alpha_k p_k along the search direction pk=Bk1gkp_k = -B_k^{-1} g_k. In both cases, step acceptability is governed by the agreement between actual and predicted reductions: ρ=E(Wk)E(Wk+sk)mk(0)mk(sk).\rho = \frac{E(W_k) - E(W_k + s_k)}{m_k(0) - m_k(s_k)}. If ρ\rho is large (model predicts well), the trust region is expanded; if small, it is contracted. The hybrid algorithm as presented employs an augmented line search, enforcing the strong Wolfe conditions for αk\alpha_k:

  • Sufficient decrease (Armijo):

E(Wk+αpk)E(Wk)+c1αgkTpk,0<c1<12E(W_k + \alpha p_k) \leq E(W_k) + c_1 \alpha g_k^T p_k, \quad 0 < c_1 < \frac{1}{2}

  • Curvature:

E(Wk+αpk)Tpkc2gkTpk,c1<c2<1|\nabla E(W_k + \alpha p_k)^T p_k| \leq c_2 |g_k^T p_k|, \quad c_1 < c_2 < 1

A bracketing (zoom) approach iteratively refines αk\alpha_k until both conditions are met.

4. Algorithmic Workflow

The hybrid backpropagation procedure iterates as follows (batch or pattern-by-pattern):

  1. Initialization: W0W_0 drawn from U(0.1,0.1)\mathcal{U}(-0.1, 0.1); B0B_0 is the identity.
  2. Forward pass: Compute Op(Wk)O^p(W_k) for all inputs.
  3. Gradient computation: Backpropagation yields gk=E(Wk)g_k = \nabla E(W_k). Explicitly:
    • For each output neuron kk: δk=Ok(1Ok)(TkOk)\delta_k = O_k (1 - O_k)(T_k - O_k).
    • For each hidden neuron jj: δj=Oj(1Oj)kwjkδk\delta_j = O_j (1 - O_j) \sum_k w_{jk} \delta_k.
    • Gradient for weight wijw_{ij}: gwij=pδnodeinputg_{w_{ij}} = -\sum_p \delta_{\text{node}} \cdot \text{input}.
  4. Search direction: Solve Bkpk=gkB_k p_k = -g_k.
  5. Line search: Find αk\alpha_k satisfying Wolfe conditions.
  6. Weight update: sk=αkpks_k = \alpha_k p_k, Wk+1=Wk+skW_{k+1} = W_k + s_k.
  7. BFGS update: Form yk=gk+1gky_k = g_{k+1} - g_k, then update Bk+1B_{k+1}.
  8. Stopping check: Halt if gk<ϵ\|g_k\| < \epsilon or kKmaxk \geq K_{\max}.

Pseudocode for the full batch hybrid quasi-Newton backpropagation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Input: training set {(x^p, T^p)}, network, f(·), ε, K_max, c₁, c₂
Initialize: W₀ ← U(−0.1,0.1), B₀ ← I, k ← 0
repeat
  Forward pass for all patterns → O^p(W_k)
  Compute g_k = ∇E(W_k) via backprop
  Solve B_k p_k = −g_k
  Line search for α_k (Wolfe conditions)
  Set s_k = α_k p_k, W_{k+1} = W_k + s_k
  Compute g_{k+1} = ∇E(W_{k+1})
  Set y_k = g_{k+1} − g_k
  Update B_{k+1}
  k ← k + 1
until (‖g_k‖ < ε) or (k ≥ K_max)
Output: W_k

5. Theoretical Convergence Properties

Global convergence is ensured under standard assumptions:

  • E(W)E(W) is twice continuously differentiable, bounded below with compact level sets.
  • BFGS updates in conjunction with line search satisfying the strong Wolfe conditions preserve positive definiteness of BkB_k.
  • It is ensured that gk0\|g_k\| \rightarrow 0, i.e., the method converges globally to a stationary point. These properties are underpinned by theory established in Dennis & Schnabel (1983) and Nocedal & Wright (1999) as cited in the source.

6. Empirical Evaluation and Results

The algorithm was evaluated on MLPs with a single hidden layer and architecture 2h12 \to h \to 1 (hidden hh in the typical range 5–20), using standard benchmark problems:

Task Training MSE Test MSE CPU Time (s)
Beale function 0.0010709 (0.107%) 0.013954 (1.40%) 69.37
Booth function 0.00009874 (0.01%) 0.0144 (1.44%) 70.25

A comparison was made with standard gradient-descent backpropagation (hand-tuned learning rate):

Algorithm Booth error Beale error
Quasi-Newton (proposed) 1.44% 1.3954%
Gradient Descent 13.59% 16.77%
  • The hybrid quasi-Newton method consistently achieved lower training and test MSE.
  • Training convergence and required epochs were faster by an order of magnitude.
  • Empirical regression plots indicated near-linear fit (R1R \approx 1).

A plausible implication is that quasi-Newton refinement of curvature avoids the need for learning-rate tuning and increases robustness for non-linear MLP optimization. This suggests significant advantages for moderate-dimensional networks where fully second-order information is intractable but first-order methods are insufficiently stable or too slow.

Hybrid Quasi-Newton Backpropagation as detailed by Ghosh & Chakraborty (Chakraborty et al., 2012) demonstrates robust convergence and efficiency improvements over plain gradient-based backpropagation for MLP training, especially on structured low-dimensional tasks. Its reliance on batch-mode curvature estimation and matrix updates scales less favorably with very high-dimensional weight spaces, limiting applicability for large-scale modern deep architectures without further adaptation.

The trust-region and line search concepts are foundational in classical unconstrained optimization, bridging first-order neural learning with robust numerical methods. While the approach predates recent advances in adaptive first-order optimizers, a plausible implication is that such hybrid quasi-Newton enhancements remain relevant for domains where convergence reliability and hand-tuning avoidance are critical.

This method connects directly to established theory on BFGS and trust-region optimization in machine learning and serves as an explicit illustration of second-order optimization within the backpropagation paradigm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hybrid Quasi-Newton Backpropagation.