AdaBLDM: Adaptive Deep Belief Networks

Updated 4 May 2026

AdaBLDM is an adaptive deep learning algorithm that automatically adjusts DBN width and depth to match dataset complexity.
It integrates neuron generation, pruning, and structural forgetting to create sparse and interpretable architectures with minimal manual tuning.
The method achieves state-of-the-art image classification accuracy by dynamically evolving its structure during training.

AdaBLDM (Adaptive Learning Method of Deep Belief Network by Layer Generation) is an adaptive deep architecture learning algorithm that automatically determines both the width (number of hidden units) and depth (number of layers) of Deep Belief Networks (DBNs) during training. Introduced by Kamada & Ichimura, AdaBLDM augments the standard DBN framework by equipping layerwise-trained Restricted Boltzmann Machines (RBMs) with structural learning mechanisms, including neuron generation, neuron annihilation, structural sparsity via forgetting, and global layer-generation criteria. These features enable AdaBLDM to produce a compact, sparse, and interpretable DBN optimized for the complexity of a given dataset, attaining state-of-the-art accuracy on image classification benchmarks (Kamada et al., 2018).

1. Problem Formulation and Motivation

Standard DBNs require manual selection of architecture, fixing both the size of each RBM layer and the total number of layers before learning. This static design leads to well-known trade-offs:

Underfitting: Too small a model cannot capture data regularities.
Overfitting and inefficiency: Oversized networks are costly to train, prone to overfitting, and difficult to interpret.

AdaBLDM addresses this by introducing an adaptive mechanism that discovers:

The optimal number of hidden units (neurons) in each RBM,
A sparse connectivity structure,
The optimal number of RBM layers needed for hierarchical feature extraction.

This reduces the need for manual architecture tuning and controls both over- and underfitting dynamically during learning (Kamada et al., 2018).

2. Algorithmic Workflow and Pseudocode

AdaBLDM proceeds in two nested, mutually adaptive loops:

Layerwise training of each RBM (Contrastive Divergence, CD-1) with adaptive width and sparsification,
Evaluation of criteria for growing a new layer atop the current DBN.

A high-level pseudocode summary:

$\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)$ 4

(Kamada et al., 2018)

3. Mathematical Formulation

3.1. RBM Layer Energy Model

The RBM layer models a joint visible-hidden distribution: $E(v,h) = -\sum_{i} b_i v_i - \sum_{j} c_j h_j - \sum_{i,j} v_i W_{ij} h_j$

$p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$

3.2. Contrastive Divergence (CD-1) Update

For weights and biases: $\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$

$\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)$

$\Delta c_{j} = \eta \left( \langle h_j \rangle_{data} - \langle h_j \rangle_{model} \right)$

3.3. Adaptive Neuron Generation/Annihilation

Generation: If for neuron $j$ ,

$(\alpha_{c}\,||d c_j||)\;\cdot\;(\alpha_{W}||d W_{:,j}||) > \theta_{G}$

then neuron $j$ is split/generated.

Annihilation: If

$\frac{1}{N} \sum_{n=1}^N p(h_j=1|v^{(n)}) < \theta_{A}$

neuron $j$ is pruned.

3.4. Structural Learning with Forgetting (SLF)

Three penalty terms encourage sparsity and binary activations:

L1 forgetting:

$p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$ 0

Hidden-unit clarification:

$p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$ 1

Selective forgetting (final pruning stage):

$p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$ 2

3.5. Layer Generation Criteria

For $p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$ 3 layers, compute: $p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$ 4

$p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$ 5

If $p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$ 6 and $p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$ 7, the DBN grows by initializing a new RBM on top (parameters inherited).

(Kamada et al., 2018)

4. Layer Generation and Hybrid Structural Learning

Layer Growth: At the end of each epoch, global statistics (weighted sum of layerwise parameter variance and energy) are computed. If both exceed thresholds, a new layer is added and initialized by parameter inheritance from its parent.
RBM Width Adaptation: Through neuron generation and pruning, each RBM layer dynamically fits the data complexity during pretraining.
Structural Learning with Forgetting: SLF injects sparsity, restricts over-parameterization, and encourages hidden-unit interpretability—yielding layers that are both compact and extract explicit knowledge from data.
Integrated Process: The entire architecture is thus shaped adaptively: width (neurons), depth (layers), and weight sparsity are co-optimized per dataset.

5. Computational Complexity and Stability

Each RBM's training step is $p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$ 8 per data vector, where $p(v,h) = \frac{1}{Z} \exp(-E(v,h)), \qquad Z=\sum_{v,h} \exp(-E(v,h))$ 9 is the visible dimension and $\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$ 0 the current number of hidden units. Adaptive structure adds $\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$ 1 per batch for generation/annihilation checks. Layer generation overhead is negligible (per-epoch summations). Total cost up to a learned $\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$ 2-layer architecture is

$\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$ 3

Stability is dynamically monitored by the variance $\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$ 4 and energy $\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$ 5 statistics in each layer; persistent high values in these quantities trigger structural growth, thus preventing underfitting and guiding self-organization (Kamada et al., 2018).

6. Experimental Protocol and Results

Datasets: CIFAR-10 and CIFAR-100, with 50,000 training and 10,000 test 32×32 color images. ZCA whitening is applied to all inputs.

Hyperparameters:

Initial hidden units per layer: 300
Mini-batch size: 100
Learning rate: $\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$ 6
Thresholds: $\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$ 7, $\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$ 8 chosen to prune underactive neurons
Layer generation: $\Delta W_{ij} = \eta \left( \langle v_i h_j \rangle_{data} - \langle v_i h_j \rangle_{model} \right)$ 9, $\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)$ 0 set to yield 4–6 layers
Forgetting coefficients: $\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)$ 1

Performance:

CIFAR-10: Up to 97.1% test accuracy ( $\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)$ 2, 5 layers), exceeding traditional DBN ( $\Delta b_{i} = \eta \left( \langle v_i \rangle_{data} - \langle v_i \rangle_{model} \right)$ 3) and CNN baseline (96.5%).
CIFAR-100: 81.3%, surpassing comparable CNN results (75.7%).
The learned DBN for CIFAR-10 self-organized into layer sizes near [433, 1595, 369, 1462, 192]; model energy and error decrease monotonically as layers are added.

(Kamada et al., 2018)

7. Significance and Applications

AdaBLDM provides a methodology for fully data-driven, architecture-agnostic training of deep generative models. By automating structure discovery at both the unit and layer level, it avoids the limitations of fixed-architecture models and reduces dependency on human hyperparameter selection. The hybridization of adaptive width (neuron-level structural learning), depth (layer generation), and global sparsification (structural forgetting) produces DBNs that are both compact and high performing. This framework is directly applicable to any domain that previously relied on hand-engineered DBN architectures, and experimental results demonstrate utility in image classification contexts.

(Kamada et al., 2018)

Markdown Report Issue Upgrade to Chat

References (1)

An Adaptive Learning Method of Deep Belief Network by Layer Generation Algorithm (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AdaBLDM Algorithm.

AdaBLDM: Adaptive Deep Belief Networks

1. Problem Formulation and Motivation

2. Algorithmic Workflow and Pseudocode

3. Mathematical Formulation

3.1. RBM Layer Energy Model

3.2. Contrastive Divergence (CD-1) Update

3.3. Adaptive Neuron Generation/Annihilation

3.4. Structural Learning with Forgetting (SLF)

3.5. Layer Generation Criteria

4. Layer Generation and Hybrid Structural Learning

5. Computational Complexity and Stability

6. Experimental Protocol and Results

7. Significance and Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

AdaBLDM: Adaptive Deep Belief Networks

1. Problem Formulation and Motivation

2. Algorithmic Workflow and Pseudocode

3. Mathematical Formulation

3.1. RBM Layer Energy Model

3.2. Contrastive Divergence (CD-1) Update

3.3. Adaptive Neuron Generation/Annihilation

3.4. Structural Learning with Forgetting (SLF)

3.5. Layer Generation Criteria

4. Layer Generation and Hybrid Structural Learning

5. Computational Complexity and Stability

6. Experimental Protocol and Results

7. Significance and Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research