End-to-End Multi-Task Learning with Attention (1803.10704v2)

Published 28 Mar 2018 in cs.CV

Abstract: We propose a novel multi-task learning architecture, which allows learning of task-specific feature-level attention. Our design, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with a soft-attention module for each task. These modules allow for learning of task-specific features from the global features, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be trained end-to-end and can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. We evaluate our approach on a variety of datasets, across both image-to-image predictions and image classification tasks. We show that our architecture is state-of-the-art in multi-task learning compared to existing methods, and is also less sensitive to various weighting schemes in the multi-task loss function. Code is available at https://github.com/lorenmt/mtan.

Citations (941)

View on Semantic Scholar

Summary

The paper presents MTAN, an architecture using task-specific soft-attention to balance shared and specialized feature learning in multi-task scenarios.
It introduces Dynamic Weight Average (DWA), a strategy that adjusts task weights based on loss rate changes to optimize training across tasks.
Experiments on CityScapes, NYUv2, and the Visual Decathlon Challenge demonstrate state-of-the-art performance with improved efficiency and fewer parameters.

End-to-End Multi-Task Learning with Attention

The paper, "End-to-End Multi-Task Learning with Attention" by Shikun Liu, Edward Johns, and Andrew J. Davison, presents a novel architecture for multi-task learning (MTL) called the Multi-Task Attention Network (MTAN). This architecture addresses the complexities involved in learning multiple tasks concurrently, particularly focusing on how to share and balance features across tasks effectively.

Contributions and Methodology

MTAN introduces an end-to-end trainable network that uses task-specific feature-level attention mechanisms to learn shared and task-specific features optimally. The architecture comprises a global feature pool and task-specific soft-attention modules. These attention modules dynamically determine the importance of shared features for each task, thereby allowing the network to learn both shared and task-specific features flexibly and efficiently.

Key aspects of MTAN are:

Network Architecture: A single shared network forms a global feature pool, with task-specific attention modules applying attention masks at each convolution block. This setup allows the model to learn the task-specific importance of shared features.
Loss Function Balancing: The authors propose a new dynamic weighting strategy called Dynamic Weight Average (DWA). DWA adjusts task weights over time by considering the relative rate of loss change, ensuring robust and balanced learning across tasks.

Experimental Setup and Results

Image-to-Image Predictions

The authors validate MTAN on two datasets: CityScapes and NYUv2, handling tasks such as semantic segmentation, depth estimation, and surface normal prediction.

Datasets:
- CityScapes: Contains high-resolution street-view images, used for semantic segmentation and inverse depth estimation.
- NYUv2: Consists of indoor scenes with RGB-D images, used for tasks including semantic segmentation, true depth estimation, and surface normal prediction.
Baselines:
- Compared against single-task networks, standard multi-task networks (splitting at the final layer), and Cross-Stitch Networks.
Performance Metrics:
- Metrics such as mean IoU, pixel accuracy for segmentation, and mean-absolute error for depth were used for evaluation.

The results indicate that MTAN outperforms baseline models across different weighting schemes, demonstrating robustness and superior performance:

On CityScapes, MTAN maintained competitive performance with significantly fewer parameters.
On NYUv2, MTAN achieved the highest scores across all tested tasks and weighting schemes.

Visual Decathlon Challenge

The Multi-Task Attention Network was also tested on the Visual Decathlon Challenge, which involves 10 different image classification tasks. MTAN built on a Wide Residual Network was able to achieve state-of-the-art performance, demonstrating robustness and parameter efficiency across multiple domains.

Implications and Future Work

The architectural design of MTAN has significant practical implications, offering a scalable and parameter-efficient approach to multi-task learning. By decoupling the learning of task-specific and shared features, the architecture can efficiently manage the interplay between tasks, even under varying complexities and domains.

The paper suggests that future developments might explore further enhancements to the attention mechanisms and adaptive task weighting strategies. Such advancements could lead to more generalized models capable of tackling increasingly complex multi-task scenarios, potentially incorporating additional modalities beyond visual data.

In summary, this paper contributes a sophisticated yet straightforward MTL architecture that is effective, efficient, and adaptable, setting a benchmark in multi-task learning with feature-level attention mechanisms.

PDF Markdown

Related Papers

GitHub

GitHub - lorenmt/mtan: The implementation of "End-to-End Multi-Task Learning with Attention" [CVPR 2019]. (681 stars)

Tweets

https://twitter.com/liu_shikun/status/1139667862505717761

YouTube

Show All Videos