Convolutional Bipartite Attractor Networks
- CBANs are energy-based neural networks that generalize Hopfield attractor dynamics using modern convolutional architectures for perceptual inference.
- They employ a bipartite topology with alternating visible and hidden layers connected via symmetric convolutions to ensure monotonic energy descent and convergence.
- Empirical evaluations demonstrate that CBANs excel in image inpainting, super-resolution, and completion tasks, achieving state-of-the-art benchmark performance.
Convolutional Bipartite Attractor Networks (CBANs) are a class of energy-based neural networks that generalize Hopfield-style attractor dynamics to high-dimensional perceptual inference tasks using modern convolutional architectures. Designed for imputation, denoising, and image completion, CBANs employ block-parallel recurrent dynamics across alternately stacked visible and hidden layers, leveraging bidirectional convolutional connectivity subject to symmetry constraints. This enables scalable, convergent energy minimization for a range of supervised and unsupervised tasks, including image inpainting and super-resolution (Iuzzolino et al., 2019).
1. Model Architecture
CBANs utilize a bipartite recurrent topology with alternating visible and hidden layers. Each visible layer connects bidirectionally to adjacent hidden layers via convolutional kernels, with no intralayer (horizontal) connections. This bipartite structure ensures that when entire layers are updated in parallel, the global energy function monotonically decreases, precluding limit cycles longer than length 1.
The network connections between each adjacent pair of layers are realized as a bank of convolutional filters , where and index the source and destination channels, respectively, and is the spatial offset. Feedback filters (from to ) are forced to be transposed and flipped versions of the feedforward filters:
Thus, the backward kernel is implemented by transposing and spatially flipping the feedforward kernel.
For a given layer , with activation for channel at spatial location , the layer-parallel update rule is:
where denotes convolution with appropriate padding and stride, is a per-channel bias, and is the activation function.
The global energy function for the network is:
where denotes sum of elementwise products, and is a barrier function. The symmetry constraint on the convolutional kernels and the monotonicity of ensures energy descent during block updates.
2. Learning Principles and Training Losses
Training in CBANs is structured around "evidence clamping," where observed inputs (e.g., visible pixel subsets) are fixed in the visible layer and the remainder are left free to settle. After iterations, the settled hidden activations and the free visible units produce a reconstruction, which is compared to the ground truth.
Three principal training losses are supported:
- Difference-of-energies loss ():
where is the clamped state, is the free state, and is the ground truth.
- Soft-hinge conditional likelihood loss ():
Interpreted as a two-state Boltzmann ratio.
- Mean-squared error (“free vs. target”) loss ():
Training uses recurrent unrolling for up to 50–100 iterations, with gradients computed at each timestep using TD() learning with , promoting strong learning signals early in inference. Gradients are clipped (e.g., or renormalization) before standard SGD or Adam update. No explicit sparsity or penalty is required; convergence is ensured by the symmetry constraint and the barrier function.
3. Activation Functions and Symmetry Constraints
The CBAN employs a leaky, piecewise-linear bounded activation
with (typically ). This activation avoids vanishing gradients while bounding the unit activations within . The associated barrier function is piecewise quadratic, with derivative equal to .
The convolutional weight symmetry constraint requires that for each layer, the backward kernel is the spatial- and channel-transposed version of the feedforward kernel:
This symmetry is the core requirement for guaranteeing monotonic energy descent through layer-parallel updates.
4. Attractor Dynamics and Convergence
Inference in CBANs proceeds iteratively:
- Clamp known visible units (evidence) to their observed values.
- Initialize free visible units to zero (or another prior).
- Alternately update hidden and free visible layers with bottom-up and top-down communication, repeating until convergence ().
- The completion is read out from the free visible units.
Block-parallel updates are guaranteed to decrease the global energy under the conditions of symmetric weights, no intralayer connections, and strictly increasing, bounded . This ensures convergence to a fixed point (or, in worst cases, a 2-cycle, which is avoided via layerwise update decomposition). Empirically, convergence is typically achieved in 30–60 iterations.
5. Empirical Results and Benchmark Applications
CBANs have been evaluated on tasks spanning toy problems, digit completion and classification, unsupervised image inpainting, and image super-resolution.
- Bar-Imputation Toy Task: On binary images containing pairs of bars (20 patterns), a fully connected bipartite attractor net achieved 99.995% accuracy, with convergence in one to two iterations.
- Supervised MNIST Completion and Classification: With visible images and one-hot labels (20 bits), masking one-third of pixels plus all labels, two hidden layers (200, 50 units) yielded 87.5% classification on masked digits and 89.9% on unmasked. With no masking, accuracy reached 98.5%. The network reliably reconstructed missing strokes.
- Unsupervised Inpainting: On Omniglot and CIFAR-10 with random – square patches masked (∼30% of pixels), the architecture (for Omniglot) consisted of visible, and hidden layers of , , and (all filters, with average pooling). CBAN achieved highest PSNR/SSIM on Omniglot, and best SSIM on CIFAR-10. Ablations lacking symmetry or TD(1) training produced blurrier completions or cycles/blobs. CBAN produced completions that preserve stroke structure and textures.
- Super-Resolution (): For visible state with six channels (three clamped low-res inputs, three high-res outputs), three hidden layers (300 channels each, all convolution) achieved the following on standard datasets:
| Set | Bicubic | DRCN | LapSRN | CBAN | |--------|-------------|----------------|----------------|----------------| | Set5 | 32.21/0.921 | 37.63/0.959 | 37.52/0.959 | 34.18/0.947 | | Set14 | 29.21/0.911 | 32.94/0.913 | 33.08/0.913 | 30.79/0.953* | | BSD100 | 28.67/0.810 | 31.85/0.894 | 31.80/0.895 | 30.12/0.872 | | Urban | 25.63/0.827 | 30.76/0.913 | 30.41/0.910 | 27.49/0.915* |
CBAN achieves the best SSIM scores on Set14 and Urban100. Qualitatively, CBAN outputs sharper edges and realistic textures, though with occasional high-frequency hallucinations.
6. Interpretive and Theoretical Considerations
CBANs extend classical Hopfield attractor network dynamics into the domain of high-dimensional, convolutional, energy-based neural architectures. By unifying weight symmetry, bounded leaky activations, and transient regulatory learning (TD(1)), CBANs can scalably and reliably settle large (k units) networks for perceptual inference tasks. The architecture is positioned as a computationally efficient alternative to deeper feedforward models and generative techniques based on costly sampling.
The bipartite, symmetric, block-parallel structure—combined with convolutional parameterization—affords convergence guarantees, interpretability of dynamic "settling," and capacity to handle dense, ambiguous input with robust imputation. The absence of explicit sparsity or weight decay regularization underscores the sufficiency of intrinsic network constraints for stable learning and inference (Iuzzolino et al., 2019).