DecomposeMe: Efficient Separable ConvNet Design

Updated 31 January 2026

DecomposeMe is a CNN architecture that factors 2D convolutions into sequential 1D filters with an intervening ReLU, significantly reducing parameter counts.
The approach employs filter sharing across spatial positions, leading to reduced redundancy and lower computational overhead while maintaining model expressivity.
Empirical results on benchmarks like ImageNet and Places2 confirm that DecomposeMe enhances generalization and efficiency in diverse network configurations.

DecomposeMe is a convolutional neural network (ConvNet) architecture modification that imposes a hard separability constraint at the level of convolutional filters, directly learning representations as compositions of 1D convolutions. This method offers substantial reductions in parameter count while maintaining or improving classification accuracy. DecomposeMe employs filter sharing across spatial positions and introduces nonlinearity (ReLU) between sequential 1D convolutions, increasing network depth and expressivity with minimal computational overhead. Comprehensive experiments on large-scale recognition benchmarks such as ImageNet and Places2 demonstrate the method’s capacity for high efficiency and strong generalization, all without post-training fine-tuning or approximations (Alvarez et al., 2016).

1. Foundational Concepts and Core Methodology

DecomposeMe enforces a separable-filter hard-constraint, parametrizing every $d \times d$ 2D convolutional kernel as the composition of two 1D filters: one vertical ( $d \times 1$ ) and one horizontal ( $1 \times d$ ). In contrast to low-rank approximation approaches that train full 2D filters and subsequently decompose, DecomposeMe directly trains decomposed 1D filters end-to-end.

Filter sharing is instituted within each layer by reusing the same bank of 1D filters across all spatial positions, removing redundant parameters and analogously reducing model complexity. An interposed nonlinearity—specifically, a ReLU activation—between the vertical and horizontal convolutions increases the effective nonlinear depth of the model, offering additional expressivity without enlarging parameter budgets.

2. Mathematical Formulation

A standard ConvNet layer with weights $\mathbf{W} \in \mathbb{R}^{C \times d \times d \times F}$ (where $C$ and $F$ are channel dimensions and $d$ is the spatial kernel size) learns each 2D kernel $\mathbf{f}^i \in \mathbb{R}^{d \times d}$ . Low-rank approximations write $\mathbf{f}^i = \sum_{k=1}^K \sigma_k^i v_k^i (h_k^i)^T$ , but this is post hoc and only approximate.

Instead, DecomposeMe constrains every 2D filter to be a composition of two 1D filter banks $\{\bar{v}_l \in \mathbb{R}^d\}_{l=1}^L$ (vertical) and $\{\bar{h}_l \in \mathbb{R}^d\}_{l=1}^L$ (horizontal), learned end-to-end. For input feature maps $a_c^0$ , the output $a_i^1$ is given by

$a_i^1 = \varphi\left( b_i^h + \sum_{l=1}^L \bar{h}_{il}^T \star \varphi\left( b_l^v + \sum_{c=1}^C \bar{v}_{lc}\star a_c^0 \right) \right)$

where $\star$ denotes 1D convolution, $\varphi(x) = \max(0,x)$ is ReLU, and $L$ is the number of intermediate 1D filters.

3. Network Architecture Modifications

DecomposeMe conversion of any $C \times F$ , $d \times d$ conv layer consists of:

A vertical 1D convolution ( $C \times L$ , $d \times 1$ ).
An intervening ReLU nonlinearity.
A horizontal 1D convolution ( $L \times F$ , $1 \times d$ ).

The number of output channels $F$ remains unchanged. Filter sharing is mandatory: the same 1D filter bank is used at every spatial location in the layer. Architectural features such as pooling, batch normalization, and dropout are retained as in the source network. For compact variants, the two large fully connected layers are removed, with the last convolution output flattened directly for final classification.

4. Parameter Efficiency and Expressivity

The parameter count for a standard 2D convolutional layer is $CF d^2$ . For a DecomposeMe layer:

$\#\text{params} = LCd + FLd = L(C+F)d$

The reduction in parameters is substantial when $L \ll Fd$ . For example, for a VGG-style configuration ( $d=3$ , $C=F=256$ , $L=256$ ), DecomposeMe reduces parameters by approximately 33% compared to the original layer. The explicit percentage reduction is:

$\frac{CFd^2 - L(C+F)d}{CFd^2} \times 100\%$

In typical settings, one selects $L \approx F$ or $L < \min(C,F)$ to balance expressivity with model compression.

5. Training Regimen and Hyperparameter Configuration

DecomposeMe networks are trained with Torch-7 from scratch (no pretraining). Stochastic gradient descent with momentum 0.9 and weight decay of $10^{-4}$ is used, with an initial learning rate of 0.01, decreased on plateau. Data augmentation consists of random cropping and horizontal flip at probability 0.5. Batch sizes vary by architecture: AlexNet-style models use 96 per GPU, VGG-B variants use 24 per GPU, and compact variants such as DecomposeMe $_8^{C–256}$ use batch sizes up to 256, leveraging the reduced memory footprint. Dropout is omitted in compact variants’ final classifier due to already low parameter counts.

6. Empirical Performance on Benchmarks

DecomposeMe achieves performance competitive with, or superior to, standard architectures while dramatically reducing parameter counts. The following table summarizes selected empirical results:

Architecture	Top-1 Accuracy	Conv+FC Params (M)	Relative Reduction
VGG-B (ImageNet full)	62.5%	9.4 + 123.5	Baseline
DecomposeMe $_5$ (full)	57.8%	2.4 + 123.5	–75% conv
VGG-B $^C$ (compact)	61.1%	9.4 + 25.0
DecomposeMe $_8^C$	65.4%	7.0 + 8.2	–26% conv, –67% FC
DecomposeMe $_8^{C-avg}$	66.2%	7.0 + 0.5
VGG-B (Places2 full)	44.0%	9.4 + 121	Baseline
DecomposeMe $_8^{C-256}$	47.4%	7.0 + 3.2	–92% total

On ImageNet 2012, DecomposeMe $_3$ (best: 61.8% Top-1, –15% conv params) and DecomposeMe $_8^{C-avg}$ (best: 66.2% Top-1) outperformed or matched baselines. On Places2, DecomposeMe $_8^{C–256}$ yielded a relative Top-1 accuracy increase of approximately +7.7% with 92% fewer parameters than VGG-B. In stereo matching for the KITTI 2012 benchmark, a DecomposeMe MC-CNN variant achieved comparable matching error rates with up to 90% parameter reduction.

In all settings experimentally explored, DecomposeMe variants met or exceeded baseline accuracy, significantly reduced model size, and frequently exhibited smaller train-validation performance gaps.

7. Application to Diverse Networks and Tasks

DecomposeMe’s procedure is broadly applicable:

Full conversion of VGG-B (all conv layers replaced with DecomposeMe modules) allowed larger batch sizes during training and, in compact form, outperformed the original in classification accuracy.
When applied to MC-CNN feature extractors for stereo matching, parameter count was reduced by an order of magnitude with only negligible increases in error rate.
The architecture promotes rapid experimentation and efficient deployment, especially for memory- or computation-constrained applications.

8. Limitations and Prospects for Further Development

Principal limitations include:

The method yields only modest speedup in the first conv layer when the number of input channels $C$ is small (e.g., RGB).
The choice of $L$ , the intermediate filter count, is a crucial but currently manual hyperparameter, trading off expressivity and compression. Automated or adaptive selection of $L$ per layer is an open challenge.
Omitting the intermediate ReLU drastically degrades performance, confirming that increased nonlinear depth is essential.
Application beyond classification and stereo tasks (e.g., detection, segmentation, generative models) remains unexplored and constitutes a direction for future research.

DecomposeMe establishes a paradigm for hard-separable, nonlinear convolutional architectures that balance compactness with accuracy, eliminating the need for post hoc low-rank approximations and providing a foundation for efficient ConvNet design (Alvarez et al., 2016).

Markdown Report Issue Upgrade to Chat

References (1)

DecomposeMe: Simplifying ConvNets for End-to-End Learning (2016)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DecomposeMe.

DecomposeMe: Efficient Separable ConvNet Design

1. Foundational Concepts and Core Methodology

2. Mathematical Formulation

3. Network Architecture Modifications

4. Parameter Efficiency and Expressivity

5. Training Regimen and Hyperparameter Configuration

6. Empirical Performance on Benchmarks

7. Application to Diverse Networks and Tasks

8. Limitations and Prospects for Further Development

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

DecomposeMe: Efficient Separable ConvNet Design

1. Foundational Concepts and Core Methodology

2. Mathematical Formulation

3. Network Architecture Modifications

4. Parameter Efficiency and Expressivity

5. Training Regimen and Hyperparameter Configuration

6. Empirical Performance on Benchmarks

7. Application to Diverse Networks and Tasks

8. Limitations and Prospects for Further Development

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research