Factorized Tensor Networks for Multi-Task and Multi-Domain Learning (2310.06124v1)
Abstract: Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The key challenge and opportunity is to exploit shared information across tasks and domains to improve the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we propose a factorized tensor network (FTN) that can achieve accuracy comparable to independent single-task/domain networks with a small number of additional parameters. FTN uses a frozen backbone network from a source model and incrementally adds task/domain-specific low-rank tensor factors to the shared frozen network. This approach can adapt to a large number of target domains and tasks without catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. We observed that FTN achieves similar accuracy as single-task/domain methods while using only a fraction of additional parameters per task.
- A. Mallya, D. Davis, and S. Lazebnik, “Piggyback: Adapting a single network to multiple tasks by learning to mask weights,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 67–82.
- A. Mallya and S. Lazebnik, “Packnet: Adding multiple tasks to a single network by iterative pruning,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7765–7773.
- S.-A. Rebuffi, H. Bilen, and A. Vedaldi, “Learning multiple visual domains with residual adapters,” Advances in neural information processing systems, vol. 30, 2017.
- R. Berriel, S. Lathuillere, M. Nabi, T. Klein, T. Oliveira-Santos, N. Sebe, and E. Ricci, “Budget-aware adapters for multi-domain learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 382–391.
- M. Wallingford, H. Li, A. Achille, A. Ravichandran, C. Fowlkes, R. Bhotika, and S. Soatto, “Task adaptive parameter sharing for multi-task learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7561–7570.
- S.-A. Rebuffi, H. Bilen, and A. Vedaldi, “Efficient parametrization of multi-domain deep neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8119–8127.
- M. Mancini, E. Ricci, B. Caputo, and S. Rota Bulo, “Adding new tasks to a single network with weight transformations using binary masks,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0–0.
- X. Sun, R. Panda, R. Feris, and K. Saenko, “Adashare: Learning what to share for efficient deep multi-task learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 8728–8740, 2020.
- M. Kanakis, D. Bruggemann, S. Saha, S. Georgoulis, A. Obukhov, and L. V. Gool, “Reparameterizing convolutions for incremental multi-task learning without task interference,” in European Conference on Computer Vision. Springer, 2020, pp. 689–707.
- K.-K. Maninis, I. Radosavovic, and I. Kokkinos, “Attentive single-tasking of multiple tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1851–1860.
- E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7167–7176.
- H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5018–5027.
- B. Mustafa, A. Loh, J. Freyberg, P. MacWilliams, M. Wilson, S. M. McKinney, M. Sieniek, J. Winkens, Y. Liu, P. Bui et al., “Supervised transfer learning at scale for medical imaging,” arXiv preprint arXiv:2101.05913, 2021.
- A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, and N. Houlsby, “Big transfer (bit): General visual representation learning,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16. Springer, 2020, pp. 491–507.
- J. O. Zhang, A. Sax, A. Zamir, L. Guibas, and J. Malik, “Side-tuning: a baseline for network adaptation via additive side networks,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 2020, pp. 698–714.
- I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, “Cross-stitch networks for multi-task learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3994–4003.
- S. Ruder, J. Bingel, I. Augenstein, and A. Søgaard, “Latent multi-task architecture learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 4822–4829.
- Y. Gao, J. Ma, M. Zhao, W. Liu, and A. L. Yuille, “Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3205–3214.
- G. Strezoski, N. v. Noord, and M. Worring, “Many task learning with task routing,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1375–1384.
- J. Liang, E. Meyerson, and R. Miikkulainen, “Evolutionary architecture search for deep multitask networks,” in Proceedings of the genetic and evolutionary computation conference, 2018, pp. 466–473.
- Y. Gao, H. Bai, Z. Jie, J. Ma, K. Jia, and W. Liu, “Mtl-nas: Task-agnostic neural architecture search towards general-purpose multi-task learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 5824–5836, 2020.
- Y. Li, N. Wang, J. Shi, J. Liu, and X. Hou, “Revisiting batch normalization for practical domain adaptation,” arXiv preprint arXiv:1603.04779, 2016.
- Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris, “Spottune: transfer learning through adaptive fine-tuning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4805–4814.
- S. Liu, E. Johns, and A. J. Davison, “End-to-end multi-task learning with attention,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1871–1880.
- S. Vandenhende, S. Georgoulis, B. De Brabandere, and L. Van Gool, “Branched multi-task networks: Deciding what layers to share,” Proceedings BMVC, 20202.
- D. Bruggemann, M. Kanakis, S. Georgoulis, and L. Van Gool, “Automated search for resource-efficient branched multi-task networks,” Proceedings BMVC, 2020.
- D. Xu, W. Ouyang, X. Wang, and N. Sebe, “Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 675–684.
- Z. Zhang, Z. Cui, C. Xu, Y. Yan, N. Sebe, and J. Yang, “Pattern-affinitive propagation across depth, surface normal and semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4106–4115.
- Z. Zhang, Z. Cui, C. Xu, Z. Jie, X. Li, and J. Yang, “Joint task-recursive learning for semantic segmentation and depth estimation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 235–251.
- S. Vandenhende, S. Georgoulis, and L. V. Gool, “Mti-net: Multi-scale task interaction networks for multi-task learning,” in European Conference on Computer Vision. Springer, 2020, pp. 527–543.
- L. Zhang, Q. Yang, X. Liu, and H. Guan, “Rethinking hard-parameter sharing in multi-domain learning,” in 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2022, pp. 01–06.
- L. Zhang, X. Liu, and H. Guan, “Automtl: A programming framework for automating efficient multi-task learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 34 216–34 228, 2022.
- Z. Chen, V. Badrinarayanan, C.-Y. Lee, and A. Rabinovich, “Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,” in International conference on machine learning. PMLR, 2018, pp. 794–803.
- A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7482–7491.
- M. Guo, A. Haque, D.-A. Huang, S. Yeung, and L. Fei-Fei, “Dynamic task prioritization for multitask learning,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 270–287.
- X. Lin, H.-L. Zhen, Z. Li, Q.-F. Zhang, and S. Kwong, “Pareto multi-task learning,” Advances in neural information processing systems, vol. 32, 2019.
- Z. Chen, J. Ngiam, Y. Huang, T. Luong, H. Kretzschmar, Y. Chai, and D. Anguelov, “Just pick a sign: Optimizing deep multitask models with gradient sign dropout,” Advances in Neural Information Processing Systems, vol. 33, pp. 2039–2050, 2020.
- P. Guo, C.-Y. Lee, and D. Ulbricht, “Learning to branch for multi-task learning,” in International Conference on Machine Learning. PMLR, 2020, pp. 3854–3863.
- L. Zhang, X. Liu, and H. Guan, “A tree-structured multi-task model recommender,” in International Conference on Automated Machine Learning. PMLR, 2022, pp. 10–1.
- E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” in International Conference on Learning Representations, 2017.
- H. Zhao, H. Zeng, X. Qin, Y. Fu, H. Wang, B. Omar, and X. Li, “What and where: Learn to plug adapters via nas for multidomain learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 11, pp. 6532–6544, 2021.
- K. Li, C. Liu, H. Zhao, Y. Zhang, and Y. Fu, “Ecacl: A holistic framework for semi-supervised domain adaptation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8578–8587.
- K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” in European Conference on Computer Vision (ECCV). Springer, 2010, pp. 213–226.
- Y. Zhao, H. Ali, and R. Vidal, “Stretching domain adaptation: How far is too far?” arXiv preprint arXiv:1712.02286, 2017.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” International Conference on Learning Representations, 2021.
- X. Yang, J. Ye, and X. Wang, “Factorizing knowledge in neural networks,” in European Conference on Computer Vision. Springer, 2022, pp. 73–91.
- J. Pfeiffer, S. Ruder, I. Vulić, and E. M. Ponti, “Modular deep learning,” arXiv preprint arXiv:2302.11529, 2023.
- Y. Yang and T. Hospedales, “A unified perspective on multi-domain and multi-task learning,” in 3rd International Conference on Learning Representations, 2015.
- ——, “Deep multi-task representation learning: A tensor factorisation approach,” in 5th International Conference on Learning Representations, 2017.
- Y. Chen, X. Jin, B. Kang, J. Feng, and S. Yan, “Sharing residual units through collective tensor factorization to improve deep neural networks.” in IJCAI, 2018, pp. 635–641.
- A. Bulat, J. Kossaifi, G. Tzimiropoulos, and M. Pantic, “Incremental multi-domain learning with network latent tensor factorization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 10 470–10 477.
- E. J. Hu, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen et al., “Lora: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2021.
- X. He, C. Li, P. Zhang, J. Yang, and X. E. Wang, “Parameter-efficient model adaptation for vision transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 817–825.
- D. Lian, D. Zhou, J. Feng, and X. Wang, “Scaling & shifting your features: A new baseline for efficient model tuning,” Advances in Neural Information Processing Systems, vol. 35, pp. 109–123, 2022.
- H. Ye and D. Xu, “Inverted pyramid multi-task transformer for dense scene understanding,” in European Conference on Computer Vision. Springer, 2022, pp. 514–530.
- Y. Xu, Y. Yang, and L. Zhang, “Demt: Deformable mixer transformer for multi-task learning of dense prediction,” in Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 3, 2023, pp. 3072–3080.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. pmlr, 2015, pp. 448–456.
- Q. Pham, C. Liu, and H. Steven, “Continual normalization: Rethinking batch normalization for online continual learning,” in International Conference on Learning Representations, 2022.
- X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010, pp. 249–256.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2020.
- M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. IEEE, 2008, pp. 722–729.
- J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” in Proceedings of the IEEE international conference on computer vision workshops, 2013, pp. 554–561.
- M. Eitz, J. Hays, and M. Alexa, “How do humans sketch objects?” ACM Transactions on graphics (TOG), vol. 31, no. 4, pp. 1–10, 2012.
- C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” 2011.
- B. Saleh and A. Elgammal, “Large-scale classification of fine-art paintings: Learning the right metric on the right feature,” International Journal for Digital Art History, no. 2, 2016.
- X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Moment matching for multi-source domain adaptation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1406–1415.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
- M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 3606–3613.
- I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee et al., “Challenges in representation learning: A report on three machine learning contests,” in Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Korea, November 3-7, 2013. Proceedings, Part III 20. Springer, 2013, pp. 117–124.
- A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011, pp. 215–223.
- N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images.” ECCV (5), vol. 7576, pp. 746–760, 2012.
- D. R. Martin, C. C. Fowlkes, and J. Malik, “Learning to detect natural image boundaries using local brightness, color, and texture cues,” IEEE transactions on pattern analysis and machine intelligence, vol. 26, no. 5, pp. 530–549, 2004.
- L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818.
- D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9446–9454.
- J.-Y. Zhu, P. Krähenbühl, E. Shechtman, and A. A. Efros, “Generative visual manipulation on the natural image manifold,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14. Springer, 2016, pp. 597–613.
- X. Pan, X. Zhan, B. Dai, D. Lin, C. C. Loy, and P. Luo, “Exploiting deep generative prior for versatile image restoration and manipulation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7474–7489, 2021.
- P.-Y. Laffont, Z. Ren, X. Tao, C. Qian, and J. Hays, “Transient attributes for high-level understanding and editing of outdoor scenes,” ACM Transactions on Graphics (proceedings of SIGGRAPH), vol. 33, no. 4, 2014.
- A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” in International Conference on Learning Representations, 2019.