Training a Hopfield Variational Autoencoder with Equilibrium Propagation (2311.15047v1)

Published 25 Nov 2023 in cs.LG and cs.NE

Abstract: On dedicated analog hardware, equilibrium propagation is an energy-efficient alternative to backpropagation. In spite of its theoretical guarantees, its application in the AI domain remains limited to the discriminative setting. Meanwhile, despite its high computational demands, generative AI is on the rise. In this paper, we demonstrate the application of Equilibrium Propagation in training a variational autoencoder (VAE) for generative modeling. Leveraging the symmetric nature of Hopfield networks, we propose using a single model to serve as both the encoder and decoder which could effectively halve the required chip size for VAE implementations, paving the way for more efficient analog hardware configurations.

Summary

The paper demonstrates that training Hopfield VAEs with Equilibrium Propagation is a viable alternative to backpropagation for energy-efficient neural computation.
It introduces a two-phase training process using Continuous Hopfield Networks where free and weakly clamped states optimize the encoder and decoder.
Experimental results indicate that dense architectures outperform layered models and a unified encoder-decoder design simplifies network implementation.

Introduction to Variational Autoencoder Training

Variational Autoencoders (VAEs) are a class of generative models in AI that allow for learning and generating new data samples. Their training typically involves backpropagation (BP), which has been a mainstay in deep learning for optimizing neural networks. However, BP comes with a high computational cost that can hinder deployment on compact and energy-efficient hardware. An alternative training approach known as Equilibrium Propagation (EP) offers a theoretically promising route that can be more compatible with analog hardware. This paper explores EP as a training method for VAEs using a type of neural network structure known as Continuous Hopfield Networks (CHNs).

Preliminaries of the Concept

The paper begins by summarizing the essential foundations that underpin the development of the Hopfield VAE:

Continuous Hopfield Networks: An advanced form of recurrent neural networks (RNNs) that evolve into an energy-efficient state. CHNs utilize symmetric weight matrices and a specific non-linearity to define an energy function, with the network finding a state of minimum energy through iteration.
Equilibrium Propagation: A training algorithm adapted for energy-based models like CHNs, EP doesn't rely on the complex computations of BP and is favored for its potential in energy-efficient hardware. It consists of two phases, the free phase, devoid of any external influence, and the weakly clamped phase, which nudges the output towards a desired target.
Variational Autoencoders: VAEs are models that learn to encode data into a latent space and then decode or generate data from this space. The training objective involves maximizing the Evidence Lower Bound (ELBO), balancing the quality of reconstruction against the cost of diverging from a prior distribution in the latent space.

The Training Process with Equilibrium Propagation

To adapt EP to VAE training, the approach involves separate Hopfield networks for the encoder and decoder, each with its own energy function. The process can be distilled into a few steps:

Free Phase: The encoder receives an input image and relaxes to an equilibrium state based on its energy function, providing latent representations.
Weakly Clamped Phase (Decoder): The equilibrium state is slightly altered to reduce reconstruction loss, using a mean squared error as a measure.
Weakly Clamped Phase (Encoder): Similarly, the encoder's parameters are adjusted to minimize the same reconstruction loss, and additional EP framework mechanisms are applied to approximate necessary gradients.

Experimental Insights

The paper reports on comprehensive experiments, including comparisons with baseline models and variants of the Hopfield VAE:

Feasibility: It's shown that one can indeed train a Hopfield VAE with EP. While the performed F-VAE, a baseline model, yielded the lowest Fréchet Inception Distance (FID), indicating better sample fidelity and diversity, the Hopfield VAE still achieved reasonable results.
Dense versus Layered Models: Models with dense (fully connected) architectures outperformed layered models, suggesting that direct communication between neurons may be beneficial, a hypothesis that invites further research.
Unified Encoder-Decoder: The paper also explores utilizing the same network for both encoding and decoding due to the symmetric nature of CHNs. This dual-use approach simplifies the model and is important for the development of analog hardware applications.

Conclusion and Outlook

In summary, the examined EP method for training VAEs is a step toward harnessing energy-efficient analog hardware for AI applications. Although the results prompt the need for further investigations into model scales and strategies to alleviate the gradient vanishing issue, the paper substantiates the viability of EP and sets the stage for future research into energetically favorable computing in AI.