Attentive Single-Tasking of Multiple Tasks (1904.08918v1)

Published 18 Apr 2019 in cs.CV

Abstract: In this work we address task interference in universal networks by considering that a network is trained on multiple tasks, but performs one task at a time, an approach we refer to as "single-tasking multiple tasks". The network thus modifies its behaviour through task-dependent feature adaptation, or task attention. This gives the network the ability to accentuate the features that are adapted to a task, while shunning irrelevant ones. We further reduce task interference by forcing the task gradients to be statistically indistinguishable through adversarial training, ensuring that the common backbone architecture serving all tasks is not dominated by any of the task-specific gradients. Results in three multi-task dense labelling problems consistently show: (i) a large reduction in the number of parameters while preserving, or even improving performance and (ii) a smooth trade-off between computation and multi-task accuracy. We provide our system's code and pre-trained models at http://vision.ee.ethz.ch/~kmaninis/astmt/.

Citations (225)

View on Semantic Scholar

Summary

The paper shows that SE modulation in the encoder-decoder improves performance across diverse dense prediction tasks while optimizing resource usage.
It demonstrates that adversarial training with a convolutional discriminator reduces gradient conflicts, yielding performance gains without extra computational cost.
The study confirms the method’s scalability across various architectural depths and datasets, achieving near single-task performance with fewer parameters.

An Evaluation of Multi-Task Learning with Modulation and Adversarial Training

This paper presents a detailed experimental evaluation of a novel approach to multi-task learning (MTL) using modulation and adversarial training. The research explores dense prediction tasks using convolutional architectures and implements the proposed methods on several datasets, notably PASCAL, NYUD, and FSV.

Datasets and Tasks

The experiments are primarily conducted on the PASCAL dataset, known for dense prediction tasks, supplemented by evaluations on the indoor scenes dataset NYUD and the synthetic images dataset FSV. Each dataset serves distinct operational domains and task diversity: edge detection, semantic segmentation, human part segmentation, surface normal estimation, saliency, and depth estimation.

Methodology

The proposed method builds on the Deeplab-v3+ architecture with a ResNet backbone, utilizing Squeeze-and-Excitation (SE) modules. A primary focus is on SE modulation in the encoder and decoder, with evaluations of performance trade-offs. Additionally, adversarial training is introduced via a convolutional discriminator to manage gradient conflicts during MTL.

Key Findings

Modulation and SE Blocks: Integrating SE blocks improves the results across multiple tasks. The SE modulation, when confined to the decoder, strikes a balance between computational cost and performance, achieving competitive results compared to modulation throughout the network.
Adversarial Training: Adversarial training benefits MTL by reducing gradient conflicts, enhancing task performance modestly without incurring additional training parameters or computational cost.
Depth Invariance: The approach is effective across varying architectural depths (R-26, R-50, R-101), displaying invariance to model complexity.
Resource Efficiency: The combination of modulation and adversarial training achieves performance parity or surpasses single-task models while requiring fewer parameters, as demonstrated by a 12.3% reduction in computational resources.

Results and Implications

The evaluation shows a significant reduction in performance drop from standard MTL, especially with SE modulation and adversarial training, achieving near single-tasking performance with enhanced resource efficiency. The positive results extend consistently across both homogeneous and heterogeneous datasets, affirming the approach's transferability across domains.

Speculations and Future Directions

The paper opens avenues for incorporating advanced modulation techniques and optimizing adversarial training strategies further. Future research might consider exploring the impact of different adversarial architectures or modulation strategies, as well as extending the task domain to various other dense prediction applications. Furthermore, these findings could inspire subsequent studies on disentangled representations in MTL frameworks and their implications in real-world applications.

Conclusion

This investigation into modulation and adversarial training for multi-task learning provides a rigorous framework that enhances performance across a range of tasks and architectures, offering promising insights for developing resource-efficient, multi-task capable neural networks.

PDF Markdown