Papers
Topics
Authors
Recent
Search
2000 character limit reached

Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

Published 9 May 2024 in cs.LG, cs.AI, cs.CV, and stat.ML | (2405.05695v1)

Abstract: We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure for the primary and auxiliary tasks, which produces different networks for training and inference. Specifically, starting from two single task networks/branches (each representing a task), we propose a novel method with evolving networks where only primary-to-auxiliary links exist as the cross-task connections after convergence. These connections can be removed during the primary task inference, resulting in a single-task inference cost. We achieve this by formulating a Neural Architecture Search (NAS) problem, where we initialize bi-directional connections in the search space and guide the NAS optimization converging to an architecture with only the single-side primary-to-auxiliary connections. Moreover, our method can be incorporated with optimization-based auxiliary learning approaches. Extensive experiments with six tasks on NYU v2, CityScapes, and Taskonomy datasets using VGG, ResNet, and ViT backbones validate the promising performance. The codes are available at https://github.com/ethanygao/Aux-NAS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Adaptive stochastic natural gradient method for one-shot neural architecture search. In ICML, pp.  171–180, 2019.
  2. Understanding and simplifying one-shot architecture search. In ICML, 2018.
  3. Automated search for resource-efficient branched multi-task networks. In BMVC, 2020.
  4. Auxiliary learning with joint task and data scheduling. In ICML, 2022.
  5. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In ICML, 2018.
  6. Just pick a sign: Optimizing deep multitask models with gradient sign dropout. In NeurIPS, 2020.
  7. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
  8. Auxiliary task update decomposition: The good, the bad and the neutral. In ICLR, 2021.
  9. Aang: Automating auxiliary learning. In ICLR, 2023.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  11. Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224, 2020.
  12. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In CVPR, 2015.
  13. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In ICLR, 2019.
  14. NDDR-CNN: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In CVPR, pp.  3205–3214, 2019.
  15. MTL-NAS: Task-agnostic neural architecture search towards general-purpose multi-task learning. arXiv preprint arXiv:2003.14058, 2020.
  16. Morphnet: Fast & simple resource-constrained structure learning of deep networks. In CVPR, pp.  1586–1595, 2018.
  17. Dynamic task prioritization for multitask learning. In ECCV, 2018.
  18. Learning to branch for multi-task learning. In ICML, 2020.
  19. Single path one-shot neural architecture search with uniform sampling. arXiv preprint arXiv:1904.00420, 2019.
  20. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In ICLR, 2016.
  21. A joint many-task model: Growing a neural network for multiple nlp tasks. In EMNLP, 2016.
  22. Deep residual learning for image recognition. In CVPR, pp.  770–778, 2016.
  23. Channel pruning for accelerating very deep neural networks. In ICCV, pp.  1398–1406, 2017.
  24. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In CVPR, 2018.
  25. Iasonas Kokkinos. Ubernet: Training auniversal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR, 2017.
  26. In defense of the unitary scalarization for deep multi-task learning. In NeurIPS, 2022.
  27. Sgas: Sequential greedy architecture search. arXiv preprint arXiv:1912.00195, 2019.
  28. Auxiliary tasks in multi-task learning. arXiv preprint arXiv:1805.06334, 2018.
  29. Reasonable effectiveness of random weighting: A litmus test for multi-task learning. In ICLR, 2022.
  30. Pareto multi-task learning. In NeurIPS, 2019.
  31. Conflict-averse gradient descent for multi-task learning. In NeurIPS, 2021a.
  32. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In CVPR, June 2019a.
  33. DARTS: Differentiable architecture search. In ICLR, 2019b.
  34. Towards impartial multi-task learning. In ICLR, 2021b.
  35. Self-supervised generalisation with meta auxiliary learning. In NeurIPS, 2019c.
  36. End-to-end multi-task learning with attention. In CVPR, pp.  1871–1880, 2019d.
  37. Auto-lambda: Disentangling dynamic task relationships. Transactions on Machine Learning Research, 2022.
  38. Auxiliary learning for deep multi-task learning. arXiv preprint arXiv:1909.02214, 2019e.
  39. Learning efficient convolutional networks through network slimming. In ICCV, pp.  2755–2763, 2017.
  40. Learning multiple tasks with deep relationship networks. In NeurIPS, 2017.
  41. Thinet: A filter level pruning method for deep neural network compression. In ICCV, pp.  5068–5076, 2017.
  42. Attentive single-tasking of multiple tasks. In CVPR, pp.  1851–1860, 2019.
  43. AtomNAS: Fine-grained end-to-end neural architecture search. In ICLR, 2020.
  44. Cross-stitch networks for multi-task learning. In CVPR, pp.  3994–4003, 2016.
  45. Auxiliary learning by implicit differentiation. In ICLR, 2021.
  46. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018.
  47. Vision transformers for dense prediction. In ICCV, 2021.
  48. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI, 44(3), 2022.
  49. Scalarization for multi-task and multi-domain learning at scale. In NeurIPS, 2023.
  50. Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
  51. Latent multi-task architecture learning. In AAAI, 2019.
  52. Convolutional neural fabrics. In NIPS, pp.  4053–4061, 2016.
  53. Multi-task learning as multi-objective optimization. In NeurIPS, 2018.
  54. Auxiliary task reweighting for minimum-data learning. In NeurIPS, 2020.
  55. Recon: Reducing conflicting gradients from the root for multi-task learning. In ICLR, 2023.
  56. Indoor segmentation and support inference from rgbd images. In ECCV, pp.  746–760, 2012.
  57. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
  58. Adashare: Learning what to share for efficient deep multi-task learning. In NeurIPS, 2020.
  59. Regularizing deep multi-task networks using orthogonal gradients. arXiv preprint arXiv:1912.06844, 2019.
  60. Branched multi-task networks: Deciding what layers to share. In BMVC, 2020a.
  61. Multi-task learning for dense prediction tasks: A survey. arXiv preprint arXiv:2004.13379, 2020b.
  62. Hydalearn: Highly dynamic task weighting for multi-task learning with auxiliary tasks. arXiv preprint arXiv:2008.11643, 2020.
  63. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In CVPR, 2019.
  64. SNAS: stochastic neural architecture search. In ICLR, 2019.
  65. Do current multi-task optimization methods in deep learning even help? In NeurIPS, 2022.
  66. Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In CVPR, pp.  675–684, 2018.
  67. Deep multi-task representation learning: A tensor factorisation approach. In ICLR, 2017.
  68. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In ICLR, 2018.
  69. Gradient surgery for multi-task learning. In NeurIPS, 2020.
  70. Taskonomy: Disentangling task transfer learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018.
  71. Customizable architecture search for semantic segmentation. In CVPR, 2019.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 0 likes about this paper.