Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction (1801.08297v4)

Published 25 Jan 2018 in cs.CV and cs.LG

Abstract: In this paper, we propose a novel Convolutional Neural Network (CNN) structure for general-purpose multi-task learning (MTL), which enables automatic feature fusing at every layer from different tasks. This is in contrast with the most widely used MTL CNN structures which empirically or heuristically share features on some specific layers (e.g., share all the features except the last convolutional layer). The proposed layerwise feature fusing scheme is formulated by combining existing CNN components in a novel way, with clear mathematical interpretability as discriminative dimensionality reduction, which is referred to as Neural Discriminative Dimensionality Reduction (NDDR). Specifically, we first concatenate features with the same spatial resolution from different tasks according to their channel dimension. Then, we show that the discriminative dimensionality reduction can be fulfilled by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN. The use of existing CNN components ensures the end-to-end training and the extensibility of the proposed NDDR layer to various state-of-the-art CNN architectures in a "plug-and-play" manner. The detailed ablation analysis shows that the proposed NDDR layer is easy to train and also robust to different hyperparameters. Experiments on different task sets with various base network architectures demonstrate the promising performance and desirable generalizability of our proposed method. The code of our paper is available at https://github.com/ethanygao/NDDR-CNN.

Citations (239)

Summary

  • The paper introduces NDDR-CNN, which fuses layerwise task-specific features using neural discriminative dimensionality reduction to optimize multi-task learning.
  • It leverages 1×1 convolutions with batch normalization and weight decay to systematically share features across layers, avoiding ad-hoc sharing decisions.
  • Extensive evaluations on models like VGG-16 and ResNet-101 demonstrate significant performance gains on tasks such as semantic segmentation and age/gender classification.

Essay on NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction

The paper explores the domain of multi-task learning (MTL) using convolutional neural networks (CNNs) by introducing a novel architecture—NDDR-CNN. This architecture tackles the prevalent challenge of optimally sharing representations across tasks to increase performance without enforcing arbitrary structural decisions common in traditional multi-task CNNs. The authors propose a systematic approach to feature fusion through a method termed Neural Discriminative Dimensionality Reduction (NDDR), promising a mathematically interpretable framework for automatic feature sharing in MTL scenarios.

Methodological Insights

The NDDR-CNN architecture strategically incorporates layerwise feature fusing across tasks by concatenating task-specific features along the channel dimension at each CNN layer maintaining the same spatial resolution. Specifically, the NDDR framework leverages 1×11 \times 1 convolution within CNNs to enable dimensionality reduction corresponding to discriminative feature embedding for each task. Additional regularization with batch normalization and weight decay ensures the model is trained efficiently in an end-to-end manner while adhering to modern CNN paradigms.

This paper substantially deviates from conventional practices where heuristic methods guide which layers to share across tasks. By systematically allowing feature concatenation and employing discriminative dimensionality reduction, the architecture avoids ad-hoc structural assumptions prevalent in conventional MTL setups. This marks a significant step towards more generalized multi-task networks as opposed to task-specific manual configurations.

Experimental Results

Extensive experimentation showcases the robustness and applicability of NDDR-CNN across various tasks and network structures, such as VGG-16 and ResNet-101. Notably, the paper illustrates its versatility in handling a spectrum of computer vision tasks, including semantic segmentation and surface normal prediction as well as age and gender classifications. The extensive ablation studies further validate the robustness of NDDR layers against various hyperparameter configurations, corroborating their stability and ease of training.

Experimental results highlight substantial improvement over baseline multi-task settings and state-of-the-art methods like cross-stitch and sluice networks, particularly in challenging datasets such as NYU v2 and IMDB-WIKI for scene understanding and age/gender classification, respectively. For instance, semantic segmentation performance, as measured by mean Intersection over Union (mIoU), exhibited notable gains by integrating the NDDR framework as part of their multitask architecture.

Theoretical and Practical Implications

This work advances the computational efficiency and scalability of MTL methods, leveraging existing CNN components innovatively for universal applicability. The mathematical elucidation of discriminative dimensionality reduction parallels existing dimensionality reduction techniques while framing them in a neural context, which could spur further theoretical inquiry.

Practically, the NDDR layer's “plug-and-play” nature envisages a seamless extension into existing state-of-the-art architectures across diverse domains. This reduces the engineering overhead needed for designing bespoke MTL structures and potentially enhances the transfer learning capabilities of CNNs in multi-faceted environments.

Future Prospects

Future research should explore explicit imposition of manifold assumptions on the feature representations within the NDDR layer to potentially harness more refined feature embeddings. Additionally, integrating NDDR-CNN advances across multimodal datasets or temporally coherent tasks could broaden its applicability further.

In conclusion, the NDDR-CNN framework innovatively addresses a core challenge in multi-task CNNs by crafting a discriminative and adaptable approach for feature sharing and dimensional reduction across tasks. Its compatibility with existing architectures and demonstrated performance gains suggest its potential as a pivotal advancement in the field of multi-task learning.