Difference Target Propagation (1412.7525v5)

Published 23 Dec 2014 in cs.LG and cs.NE

Abstract: Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment. This could become a serious issue as one considers deeper and more non-linear functions, e.g., consider the extreme case of nonlinearity where the relation between parameters and cost is actually discrete. Inspired by the biological implausibility of back-propagation, a few approaches have been proposed in the past that could play a similar credit assignment role. In this spirit, we explore a novel approach to credit assignment in deep networks that we call target propagation. The main idea is to compute targets rather than gradients, at each layer. Like gradients, they are propagated backwards. In a way that is related but different from previously proposed proxies for back-propagation which rely on a backwards network with symmetric weights, target propagation relies on auto-encoders at each layer. Unlike back-propagation, it can be applied even when units exchange stochastic bits rather than real numbers. We show that a linear correction for the imperfectness of the auto-encoders, called difference target propagation, is very effective to make target propagation actually work, leading to results comparable to back-propagation for deep networks with discrete and continuous units and denoising auto-encoders and achieving state of the art for stochastic networks.

Citations (334)

View on Semantic Scholar

Summary

The paper introduces difference target propagation as an innovative alternative to back-propagation for assigning credit in deep neural networks.
It employs auto-encoders to compute layer-wise target values, achieving performance parity with established methods on MNIST across various network types.
The approach enhances biological plausibility and supports low-power hardware designs, paving the way for future research in efficient neural training.

Difference Target Propagation: An Insightful Overview

The paper "Difference Target Propagation" by Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, and Yoshua Bengio offers a fresh perspective on credit assignment in deep neural networks, inspired by the limitations of back-propagation (BP). It explores the use of target propagation (TP), where instead of relying on inﬁnitesimal gradients, targets are computed and propagated backward through the network. This method promises improved biological plausibility and enhanced robustness in handling networks with discrete and stochastic units, certain strong non-linearities, and the potential for lower computational power demands.

Motivation and Context

Back-propagation has underpinned the recent success in deep learning, primarily attributed to its role in credit assignment by leveraging the chain rule to propagate loss gradients. However, as neural architectures grow deeper and more complex, BP encounters significant hurdles, such as vanishing or exploding gradients and the biological implausibility critique. BP’s dependency on smooth and differentiable functions limits its applicability, especially when adapting to the non-linear and discrete nature of biological neural computations. The authors address these issues with target propagation, aimed at more effectively assigning credit in neural networks.

Conceptual Framework

Target propagation operates by associating target values with each unit activation, contrasting with BP’s gradient-based updates. These targets ideally correspond to lower network losses and propagate backward through a feedback mechanism akin to auto-encoding. Specifically, auto-encoders are leveraged to compute these targets and correct layer-wise functions for their imperfect inversions through what the authors define as "difference target propagation."

The paper introduces a mathematical formalization of TP wherein local loss functions derive from target values rather than global loss gradients. Auto-encoders provide a key mechanism in computing feedback, optimizing through stochastic gradient descent in a target-centered manner. This innovation seeks to maintain network coherence during training, especially in deeper networks with pronounced non-linearities.

Methodological Approach and Results

The methodological shift to difference TP showcases robust network training performance comparable to RMSprop-augmented BP strategies. Remarkably, the difference TP method achieves state-of-the-art performance in training stochastic neural networks on datasets like MNIST. Numerical results demonstrate parity with BP across various architectures, including deterministic feedforward networks and those involving discrete transmission between units.

Key experiments involve:

Deterministic Networks: Achieving classification accuracy comparable to BP on MNIST with marginal differences in error rates.
Networks with Discretized Transmission: Demonstrating ability to handle signal discretization between layers effectively, overcoming limitations of BP’s gradient propagation.
Stochastic Networks: Surpassing the performance of established BP methods using stochastic units, underscoring TP’s potential in this domain.
Auto-encoders: Training efficiency and effectiveness in learning representations hint at TP’s utility in unsupervised learning contexts.

Implications and Speculations

Target propagation posits significant theoretical and practical implications. Theoretically, it extends the paradigm of neural network training by integrating biologically plausible mechanisms potentially aligning artificial neural computations closer to biological counterparts. Practically, TP could substantially impact hardware developments, enabling low-power devices to exploit deep learning capabilities through bit-communication units.

Looking forward, TP provides fertile ground for further research, including exploring more sophisticated inverse functions and refining the TP algorithm for broader applicability across diverse architectures and tasks in AI. The exploration of hybrid models, marrying TP with traditional BP, could yield intriguing insights into optimizing training efficiency and convergence speed in complex neural architectures.

In summary, "Difference Target Propagation" redefines how neural networks can compute and assign credit, promising an array of future developments in AI model training, robustness, and potential biological integration. The paper stands as a testament to the evolving landscape of efficient network training paradigms, challenging the long-standing dominance of gradient-based methods.

PDF Markdown

Related Papers

Tweets

https://twitter.com/mmdhrumil/status/1885992080998281269