- The paper introduces a tensor factorisation method that enables deep neural networks to share representations across multiple tasks.
- It uses Tucker and Tensor Train decompositions to automatically determine optimal parameter sharing without manual design.
- Empirical results on tasks like MNIST and facial recognition show improved accuracy and reduced complexity in multi-task learning architectures.
Deep Multi-Task Representation Learning: A Tensor Factorisation Approach
In this paper, Yang and Hospedales propose a novel method to address multi-task learning (MTL) in deep neural networks (DNNs) through the application of tensor factorisation. Their work aims to overcome the limitations of traditional linear model-based MTL by facilitating automatic sharing of the representation structure across tasks in a deep learning context. The authors leverage tensor factorisation to naturally extend matrix factorisation techniques, thus enhancing conventional MTL algorithms, which require manually defined task-sharing strategies.
Key Contributions
- Tensor Factorisation for Deep MTL: The authors introduce a method for sharing structure at every layer of a DNN by using tensor factorisation. This approach allows for both homogeneous and heterogeneous MTL settings, where tasks may be similar or distinct, respectively. Tensor factorisation serves as a generalisation of shallow matrix-based MTL methods, thereby broadening the potential of knowledge sharing at multiple network layers.
- Automatic Knowledge Sharing: By implementing tensor decompositions—specifically Tucker and Tensor Train decomposition—the approach offers a systematic way to discover where and how much to share parameters across tasks. This alleviates the burden of user-defined sharing structures, which are common in existing deep MTL solutions.
- Practical Efficacy and Design Simplification: The proposed method not only enhances the accuracy of deep learning models but also reduces the complexity involved in designing DNN architectures. The authors highlight how their approach mitigates the issues related to selecting parameters manually for shared and task-specific layers, a critical advancement in DNN design efficiency.
Methodology and Experiments
The research applies tensor factorisation techniques in creating shared and task-specific components of DNN layers. The authors develop their framework using TensorFlow and demonstrate its effectiveness across various MTL benchmark tasks, such as digit recognition with MNIST and heterogeneous learning tasks like facial attribute classification and multi-alphabet recognition.
- MNIST Task: For binary classification (one-vs-all), the proposed DMTRL methods outperform traditional STL and user-defined MTL approaches. Notably, DMTRL-Tensor Train (TT) achieves the lowest error rates, showcasing the importance of end-to-end deep multi-task representation learning.
- Facial Attribute and Multi-Alphabet Recognition: In the challenging domain of heterogeneous MTL, the proposed DMTRL methods again surpass baseline methods, with DMTRL-Tucker consistently demonstrating superior performance over both STL and user-defined MTL for gender and age classification on the AdienceFaces dataset. Similarly, in the Omniglot experiment, DMTRL methods deliver significant improvements across varied alphabets.
Implications and Future Directions
The findings stress the potential for tensor factorisation to dynamically determine optimal sharing structures within deep networks, thus enhancing both multi-task performance and architectural efficiency. Future developments could further explore automated tuning mechanisms within tensor decompositions to optimize rank selection and extend applicability to complex, real-world multi-task scenarios. Furthermore, expanding this approach to model other structural variations in data beyond task parallelism can open new avenues for representation learning within AI systems.
By effectively reducing the challenges associated with designing MTL architectures, this paper represents a significant contribution to the development of adaptive representation learning methodologies. It provides a robust foundation for researchers to build upon in enhancing the generalisability and scalability of MTL systems in both homogeneous and heterogeneous task domains.