Papers
Topics
Authors
Recent
Search
2000 character limit reached

DecomposeMe: Efficient Separable ConvNet Design

Updated 31 January 2026
  • DecomposeMe is a CNN architecture that factors 2D convolutions into sequential 1D filters with an intervening ReLU, significantly reducing parameter counts.
  • The approach employs filter sharing across spatial positions, leading to reduced redundancy and lower computational overhead while maintaining model expressivity.
  • Empirical results on benchmarks like ImageNet and Places2 confirm that DecomposeMe enhances generalization and efficiency in diverse network configurations.

DecomposeMe is a convolutional neural network (ConvNet) architecture modification that imposes a hard separability constraint at the level of convolutional filters, directly learning representations as compositions of 1D convolutions. This method offers substantial reductions in parameter count while maintaining or improving classification accuracy. DecomposeMe employs filter sharing across spatial positions and introduces nonlinearity (ReLU) between sequential 1D convolutions, increasing network depth and expressivity with minimal computational overhead. Comprehensive experiments on large-scale recognition benchmarks such as ImageNet and Places2 demonstrate the method’s capacity for high efficiency and strong generalization, all without post-training fine-tuning or approximations (Alvarez et al., 2016).

1. Foundational Concepts and Core Methodology

DecomposeMe enforces a separable-filter hard-constraint, parametrizing every d×dd \times d 2D convolutional kernel as the composition of two 1D filters: one vertical (d×1d \times 1) and one horizontal (1×d1 \times d). In contrast to low-rank approximation approaches that train full 2D filters and subsequently decompose, DecomposeMe directly trains decomposed 1D filters end-to-end.

Filter sharing is instituted within each layer by reusing the same bank of 1D filters across all spatial positions, removing redundant parameters and analogously reducing model complexity. An interposed nonlinearity—specifically, a ReLU activation—between the vertical and horizontal convolutions increases the effective nonlinear depth of the model, offering additional expressivity without enlarging parameter budgets.

2. Mathematical Formulation

A standard ConvNet layer with weights WRC×d×d×F\mathbf{W} \in \mathbb{R}^{C \times d \times d \times F} (where CC and FF are channel dimensions and dd is the spatial kernel size) learns each 2D kernel fiRd×d\mathbf{f}^i \in \mathbb{R}^{d \times d}. Low-rank approximations write fi=k=1Kσkivki(hki)T\mathbf{f}^i = \sum_{k=1}^K \sigma_k^i v_k^i (h_k^i)^T, but this is post hoc and only approximate.

Instead, DecomposeMe constrains every 2D filter to be a composition of two 1D filter banks {vˉlRd}l=1L\{\bar{v}_l \in \mathbb{R}^d\}_{l=1}^L (vertical) and {hˉlRd}l=1L\{\bar{h}_l \in \mathbb{R}^d\}_{l=1}^L (horizontal), learned end-to-end. For input feature maps ac0a_c^0, the output ai1a_i^1 is given by

ai1=φ(bih+l=1LhˉilTφ(blv+c=1Cvˉlcac0))a_i^1 = \varphi\left( b_i^h + \sum_{l=1}^L \bar{h}_{il}^T \star \varphi\left( b_l^v + \sum_{c=1}^C \bar{v}_{lc}\star a_c^0 \right) \right)

where \star denotes 1D convolution, φ(x)=max(0,x)\varphi(x) = \max(0,x) is ReLU, and LL is the number of intermediate 1D filters.

3. Network Architecture Modifications

DecomposeMe conversion of any C×FC \times F, d×dd \times d conv layer consists of:

  • A vertical 1D convolution (C×LC \times L, d×1d \times 1).
  • An intervening ReLU nonlinearity.
  • A horizontal 1D convolution (L×FL \times F, 1×d1 \times d).

The number of output channels FF remains unchanged. Filter sharing is mandatory: the same 1D filter bank is used at every spatial location in the layer. Architectural features such as pooling, batch normalization, and dropout are retained as in the source network. For compact variants, the two large fully connected layers are removed, with the last convolution output flattened directly for final classification.

4. Parameter Efficiency and Expressivity

The parameter count for a standard 2D convolutional layer is CFd2CF d^2. For a DecomposeMe layer:

#params=LCd+FLd=L(C+F)d\#\text{params} = LCd + FLd = L(C+F)d

The reduction in parameters is substantial when LFdL \ll Fd. For example, for a VGG-style configuration (d=3d=3, C=F=256C=F=256, L=256L=256), DecomposeMe reduces parameters by approximately 33% compared to the original layer. The explicit percentage reduction is:

CFd2L(C+F)dCFd2×100%\frac{CFd^2 - L(C+F)d}{CFd^2} \times 100\%

In typical settings, one selects LFL \approx F or L<min(C,F)L < \min(C,F) to balance expressivity with model compression.

5. Training Regimen and Hyperparameter Configuration

DecomposeMe networks are trained with Torch-7 from scratch (no pretraining). Stochastic gradient descent with momentum 0.9 and weight decay of 10410^{-4} is used, with an initial learning rate of 0.01, decreased on plateau. Data augmentation consists of random cropping and horizontal flip at probability 0.5. Batch sizes vary by architecture: AlexNet-style models use 96 per GPU, VGG-B variants use 24 per GPU, and compact variants such as DecomposeMe8C256_8^{C–256} use batch sizes up to 256, leveraging the reduced memory footprint. Dropout is omitted in compact variants’ final classifier due to already low parameter counts.

6. Empirical Performance on Benchmarks

DecomposeMe achieves performance competitive with, or superior to, standard architectures while dramatically reducing parameter counts. The following table summarizes selected empirical results:

Architecture Top-1 Accuracy Conv+FC Params (M) Relative Reduction
VGG-B (ImageNet full) 62.5% 9.4 + 123.5 Baseline
DecomposeMe5_5 (full) 57.8% 2.4 + 123.5 –75% conv
VGG-BC^C (compact) 61.1% 9.4 + 25.0
DecomposeMe8C_8^C 65.4% 7.0 + 8.2 –26% conv, –67% FC
DecomposeMe8Cavg_8^{C-avg} 66.2% 7.0 + 0.5
VGG-B (Places2 full) 44.0% 9.4 + 121 Baseline
DecomposeMe8C256_8^{C-256} 47.4% 7.0 + 3.2 –92% total

On ImageNet 2012, DecomposeMe3_3 (best: 61.8% Top-1, –15% conv params) and DecomposeMe8Cavg_8^{C-avg} (best: 66.2% Top-1) outperformed or matched baselines. On Places2, DecomposeMe8C256_8^{C–256} yielded a relative Top-1 accuracy increase of approximately +7.7% with 92% fewer parameters than VGG-B. In stereo matching for the KITTI 2012 benchmark, a DecomposeMe MC-CNN variant achieved comparable matching error rates with up to 90% parameter reduction.

In all settings experimentally explored, DecomposeMe variants met or exceeded baseline accuracy, significantly reduced model size, and frequently exhibited smaller train-validation performance gaps.

7. Application to Diverse Networks and Tasks

DecomposeMe’s procedure is broadly applicable:

  • Full conversion of VGG-B (all conv layers replaced with DecomposeMe modules) allowed larger batch sizes during training and, in compact form, outperformed the original in classification accuracy.
  • When applied to MC-CNN feature extractors for stereo matching, parameter count was reduced by an order of magnitude with only negligible increases in error rate.
  • The architecture promotes rapid experimentation and efficient deployment, especially for memory- or computation-constrained applications.

8. Limitations and Prospects for Further Development

Principal limitations include:

  • The method yields only modest speedup in the first conv layer when the number of input channels CC is small (e.g., RGB).
  • The choice of LL, the intermediate filter count, is a crucial but currently manual hyperparameter, trading off expressivity and compression. Automated or adaptive selection of LL per layer is an open challenge.
  • Omitting the intermediate ReLU drastically degrades performance, confirming that increased nonlinear depth is essential.
  • Application beyond classification and stereo tasks (e.g., detection, segmentation, generative models) remains unexplored and constitutes a direction for future research.

DecomposeMe establishes a paradigm for hard-separable, nonlinear convolutional architectures that balance compactness with accuracy, eliminating the need for post hoc low-rank approximations and providing a foundation for efficient ConvNet design (Alvarez et al., 2016).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DecomposeMe.