Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning (2003.14058v1)

Published 31 Mar 2020 in cs.LG, cs.CV, and stat.ML

Abstract: We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL). Existing NAS methods typically define different search spaces according to different tasks. In order to adapt to different task combinations (i.e., task sets), we disentangle the GP-MTL networks into single-task backbones (optionally encode the task priors), and a hierarchical and layerwise features sharing/fusing scheme across them. This enables us to design a novel and general task-agnostic search space, which inserts cross-task edges (i.e., feature fusion connections) into fixed single-task network backbones. Moreover, we also propose a novel single-shot gradient-based search algorithm that closes the performance gap between the searched architectures and the final evaluation architecture. This is realized with a minimum entropy regularization on the architecture weights during the search phase, which makes the architecture weights converge to near-discrete values and therefore achieves a single model. As a result, our searched model can be directly used for evaluation without (re-)training from scratch. We perform extensive experiments using different single-task backbones on various task sets, demonstrating the promising performance obtained by exploiting the hierarchical and layerwise features, as well as the desirable generalizability to different i) task sets and ii) single-task backbones. The code of our paper is available at https://github.com/bhpfelix/MTLNAS.

Citations (90)

Summary

  • The paper introduces a novel task-agnostic search space that decouples fixed task backbones from flexible feature fusion, enhancing general-purpose multi-task learning.
  • It employs a hierarchical, layerwise feature-sharing approach to optimize inter-task connectivity and effectively mitigate negative transfer.
  • A single-shot gradient-based algorithm with minimum entropy regularization drives performance gains over state-of-the-art methods in diverse multitask settings.

Overview of MTL-NAS: Task-Agnostic Neural Architecture Search for General-Purpose Multi-Task Learning

The paper presents "MTL-NAS," an approach that incorporates Neural Architecture Search (NAS) into General-Purpose Multi-Task Learning (GP-MTL). Unlike conventional NAS methods that customize search spaces for specific tasks, this work introduces a task-agnostic search space. The authors propose a novel framework to decouple task-specific aspects from network connectivity, allowing the optimization of inter-task feature fusion architectures applicable across diverse task sets.

Methodological Contributions

  1. Task-Agnostic Search Space Design: The methodology dissects GP-MTL architectures into fixed single-task backbones and a flexible feature-sharing mechanism. This decouples task-specific knowledge encoded in backbones from inter-task architecture optimization, rendering the search space adaptable to any task combination.
  2. Hierarchical and Layerwise Feature Sharing: The paper advocates for a hierarchical approach that inserts feature fusion connections between layers of different task backbones. The search space is vast, applicable to all potential combinations of task layers across different network branches.
  3. Single-Shot Gradient-Based Search Algorithm: To bridge the gap between architecture search performance and evaluation, the authors introduce a search algorithm incorporating a minimum entropy regularization term. This strategy guides architecture weights towards discrete values, alleviating the usual discrepancy observed between the search phase and final architecture during evaluation.

Experimental Performance

The authors report robust performance gains across various configurations. Specifically, the proposed MTL-NAS outperforms state-of-the-art methods such as NDDR-CNN and cross-stitch networks in multitask settings involving semantic segmentation and surface normal estimation. The results validate the efficiency of the hierarchical feature fusion in capturing useful inter-task representations.

Implications and Future Directions

The presented work makes significant strides in automating the multi-task learning process by optimizing architecture configurations that are traditionally manual and bespoke for each task set. The task-agnosticity of MTL-NAS has practical implications for reducing both the time and computational resources required to design robust architectures suited for multi-task settings.

Furthermore, this research underscores the potential of task-agnostic architectures in facilitating the simultaneous learning of diverse tasks without incurring negative transfer, a common pitfall in shared feature spaces. By successfully decoupling task-specific layers from inter-task connections, the paper introduces a generalized paradigm that could influence future advancements in scalable, multitask neural architectures.

Conclusion

MTL-NAS signifies a substantial step towards versatile and efficient multitask learning frameworks. The methodological innovations presented aim to democratize NAS for multifaceted applications, potentially paving the way for more adaptive and universally applicable neural network designs in AI. Future research could explore integrating richer sets of backbone architectures and expanding the flexibility of feature fusion operations, further enhancing the applicability of such frameworks in real-world scenarios.