Papers
Topics
Authors
Recent
Search
2000 character limit reached

AdaBLDM: Adaptive Deep Belief Networks

Updated 4 May 2026
  • AdaBLDM is an adaptive deep learning algorithm that automatically adjusts DBN width and depth to match dataset complexity.
  • It integrates neuron generation, pruning, and structural forgetting to create sparse and interpretable architectures with minimal manual tuning.
  • The method achieves state-of-the-art image classification accuracy by dynamically evolving its structure during training.

AdaBLDM (Adaptive Learning Method of Deep Belief Network by Layer Generation) is an adaptive deep architecture learning algorithm that automatically determines both the width (number of hidden units) and depth (number of layers) of Deep Belief Networks (DBNs) during training. Introduced by Kamada & Ichimura, AdaBLDM augments the standard DBN framework by equipping layerwise-trained Restricted Boltzmann Machines (RBMs) with structural learning mechanisms, including neuron generation, neuron annihilation, structural sparsity via forgetting, and global layer-generation criteria. These features enable AdaBLDM to produce a compact, sparse, and interpretable DBN optimized for the complexity of a given dataset, attaining state-of-the-art accuracy on image classification benchmarks (Kamada et al., 2018).

1. Problem Formulation and Motivation

Standard DBNs require manual selection of architecture, fixing both the size of each RBM layer and the total number of layers before learning. This static design leads to well-known trade-offs:

  • Underfitting: Too small a model cannot capture data regularities.
  • Overfitting and inefficiency: Oversized networks are costly to train, prone to overfitting, and difficult to interpret.

AdaBLDM addresses this by introducing an adaptive mechanism that discovers:

  • The optimal number of hidden units (neurons) in each RBM,
  • A sparse connectivity structure,
  • The optimal number of RBM layers needed for hierarchical feature extraction.

This reduces the need for manual architecture tuning and controls both over- and underfitting dynamically during learning (Kamada et al., 2018).

2. Algorithmic Workflow and Pseudocode

AdaBLDM proceeds in two nested, mutually adaptive loops:

  • Layerwise training of each RBM (Contrastive Divergence, CD-1) with adaptive width and sparsification,
  • Evaluation of criteria for growing a new layer atop the current DBN.

A high-level pseudocode summary:

Δbi=η(vidatavimodel)\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)4

(Kamada et al., 2018)

3. Mathematical Formulation

3.1. RBM Layer Energy Model

The RBM layer models a joint visible-hidden distribution: E(v,h)=ibivijcjhji,jviWijhjE(v,h) = -\sum_{i} b_i v_i - \sum_{j} c_j h_j - \sum_{i,j} v_i W_{ij} h_j

p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))

3.2. Contrastive Divergence (CD-1) Update

For weights and biases: ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)

Δbi=η(vidatavimodel)\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)

Δcj=η(hjdatahjmodel)\Delta c_{j} = \eta \left( \langle h_j \rangle_{data} - \langle h_j \rangle_{model} \right)

3.3. Adaptive Neuron Generation/Annihilation

  • Generation: If for neuron jj,

(αcdcj)    (αWdW:,j)>θG(\alpha_{c}\,||d c_j||)\;\cdot\;(\alpha_{W}||d W_{:,j}||) > \theta_{G}

then neuron jj is split/generated.

  • Annihilation: If

1Nn=1Np(hj=1v(n))<θA\frac{1}{N} \sum_{n=1}^N p(h_j=1|v^{(n)}) < \theta_{A}

neuron jj is pruned.

3.4. Structural Learning with Forgetting (SLF)

Three penalty terms encourage sparsity and binary activations:

  • L1 forgetting:

p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))0

  • Hidden-unit clarification:

p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))1

  • Selective forgetting (final pruning stage):

p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))2

3.5. Layer Generation Criteria

For p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))3 layers, compute: p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))4

p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))5

If p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))6 and p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))7, the DBN grows by initializing a new RBM on top (parameters inherited).

(Kamada et al., 2018)

4. Layer Generation and Hybrid Structural Learning

  • Layer Growth: At the end of each epoch, global statistics (weighted sum of layerwise parameter variance and energy) are computed. If both exceed thresholds, a new layer is added and initialized by parameter inheritance from its parent.
  • RBM Width Adaptation: Through neuron generation and pruning, each RBM layer dynamically fits the data complexity during pretraining.
  • Structural Learning with Forgetting: SLF injects sparsity, restricts over-parameterization, and encourages hidden-unit interpretability—yielding layers that are both compact and extract explicit knowledge from data.
  • Integrated Process: The entire architecture is thus shaped adaptively: width (neurons), depth (layers), and weight sparsity are co-optimized per dataset.

5. Computational Complexity and Stability

Each RBM's training step is p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))8 per data vector, where p(v,h)=1Zexp(E(v,h)),Z=v,hexp(E(v,h))p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))9 is the visible dimension and ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)0 the current number of hidden units. Adaptive structure adds ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)1 per batch for generation/annihilation checks. Layer generation overhead is negligible (per-epoch summations). Total cost up to a learned ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)2-layer architecture is

ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)3

Stability is dynamically monitored by the variance ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)4 and energy ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)5 statistics in each layer; persistent high values in these quantities trigger structural growth, thus preventing underfitting and guiding self-organization (Kamada et al., 2018).

6. Experimental Protocol and Results

Datasets: CIFAR-10 and CIFAR-100, with 50,000 training and 10,000 test 32×32 color images. ZCA whitening is applied to all inputs.

Hyperparameters:

  • Initial hidden units per layer: 300
  • Mini-batch size: 100
  • Learning rate: ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)6
  • Thresholds: ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)7, ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)8 chosen to prune underactive neurons
  • Layer generation: ΔWij=η(vihjdatavihjmodel)\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)9, Δbi=η(vidatavimodel)\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)0 set to yield 4–6 layers
  • Forgetting coefficients: Δbi=η(vidatavimodel)\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)1

Performance:

  • CIFAR-10: Up to 97.1% test accuracy (Δbi=η(vidatavimodel)\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)2, 5 layers), exceeding traditional DBN (Δbi=η(vidatavimodel)\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)3) and CNN baseline (96.5%).
  • CIFAR-100: 81.3%, surpassing comparable CNN results (75.7%).
  • The learned DBN for CIFAR-10 self-organized into layer sizes near [433, 1595, 369, 1462, 192]; model energy and error decrease monotonically as layers are added.

(Kamada et al., 2018)

7. Significance and Applications

AdaBLDM provides a methodology for fully data-driven, architecture-agnostic training of deep generative models. By automating structure discovery at both the unit and layer level, it avoids the limitations of fixed-architecture models and reduces dependency on human hyperparameter selection. The hybridization of adaptive width (neuron-level structural learning), depth (layer generation), and global sparsification (structural forgetting) produces DBNs that are both compact and high performing. This framework is directly applicable to any domain that previously relied on hand-engineered DBN architectures, and experimental results demonstrate utility in image classification contexts.

(Kamada et al., 2018)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AdaBLDM Algorithm.