Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI (2306.12205v1)
Abstract: Pre-trained LLMs have recently emerged as a powerful tool for fine-tuning a variety of language tasks. Ideally, when models are pre-trained on large amount of data, they are expected to gain implicit knowledge. In this paper, we investigate the ability of pre-trained LLMs to generalize to different non-language tasks. In particular, we test them on tasks from different domains such as computer vision, reasoning on hierarchical data, and protein fold prediction. The four pre-trained models that we used, T5, BART, BERT, and GPT-2 achieve outstanding results. They all have similar performance and they outperform transformers that are trained from scratch by a large margin. For instance, pre-trained LLMs perform better on the Listops dataset, with an average accuracy of 58.7\%, compared to transformers trained from scratch, which have an average accuracy of 29.0\%. The significant improvement demonstrated across three types of datasets suggests that pre-training on language helps the models to acquire general knowledge, bringing us a step closer to general AI. We also showed that reducing the number of parameters in pre-trained LLMs does not have a great impact as the performance drops slightly when using T5-Small instead of T5-Base. In fact, when using only 2\% of the parameters, we achieved a great improvement compared to training from scratch. Finally, in contrast to prior work, we find out that using pre-trained embeddings for the input layer is necessary to achieve the desired results.
- Dong, Linhao, Shuang Xu, and Bo Xu. “Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018.
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. “Improving Language Understanding by Generative Pre-training”.
- Lu, K., Grover, A., Abbeel, P., & Mordatch, I. (2021). “Pretrained transformers as universal computation engines.” arXiv preprint arXiv:2103.05247.
- Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). “Imagenet: A large-scale hierarchical image database.” In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
- Tida, Vijay Srinivas, and Sonya Hsu. “Universal Spam Detection using Transfer Learning of BERT Model.” arXiv preprint arXiv:2202.03480 (2022).
- Azzouza, Noureddine, Karima Akli-Astouati, and Roliana Ibrahim. “Twitterbert: Framework for twitter sentiment analysis based on pre-trained language model representations.” International Conference of Reliable Information and Communication Technology. Springer, Cham, 2019.
- Nogueira, R., Jiang, Z., & Lin, J. (2021). “Investigating the limitations of transformers with simple arithmetic tasks.” arXiv preprint arXiv:2102.13019.
- Hu, Ronghang, and Amanpreet Singh. “Unit: Multimodal multitask learning with a unified transformer.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
- Kaiser, L., Gomez, A. N., Shazeer, N., Vaswani, A., Parmar, N., Jones, L., & Uszkoreit, J. (2017). “One model to learn them all.” arXiv preprint arXiv:1706.05137.
- Nangia, Nikita, and Samuel R. Bowman. “Listops: A diagnostic dataset for latent tree learning.” arXiv preprint arXiv:1804.06028 (2018).
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
- Beltagy, Iz, Matthew E. Peters, and Arman Cohan. “Longformer: The long-document transformer.” arXiv preprint arXiv:2004.05150 (2020).
- Wang, S., Li, B. Z., Khabsa, M., Fang, H., & Ma, H. (2020). “Linformer: Self-attention with linear complexity.” arXiv preprint arXiv:2006.04768.
- Kitaev, Nikita, Łukasz Kaiser, and Anselm Levskaya. “Reformer: The efficient transformer.” arXiv preprint arXiv:2001.04451 (2020).
- Child, R., Gray, S., Radford, A., & Sutskever, I. (2019). “Generating long sequences with sparse transformers.” arXiv preprint arXiv:1904.10509.
- Rebuffi, Sylvestre-Alvise, Hakan Bilen, and Andrea Vedaldi. “Learning multiple visual domains with residual adapters.” Advances in neural information processing systems 30 (2017).
- Mohamad Ballout (7 papers)
- Ulf Krumnack (11 papers)
- Gunther Heidemann (8 papers)
- Kai-Uwe Kühnberger (13 papers)