Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI (2306.12205v1)

Published 21 Jun 2023 in cs.CL and cs.AI

Abstract: Pre-trained LLMs have recently emerged as a powerful tool for fine-tuning a variety of language tasks. Ideally, when models are pre-trained on large amount of data, they are expected to gain implicit knowledge. In this paper, we investigate the ability of pre-trained LLMs to generalize to different non-language tasks. In particular, we test them on tasks from different domains such as computer vision, reasoning on hierarchical data, and protein fold prediction. The four pre-trained models that we used, T5, BART, BERT, and GPT-2 achieve outstanding results. They all have similar performance and they outperform transformers that are trained from scratch by a large margin. For instance, pre-trained LLMs perform better on the Listops dataset, with an average accuracy of 58.7\%, compared to transformers trained from scratch, which have an average accuracy of 29.0\%. The significant improvement demonstrated across three types of datasets suggests that pre-training on language helps the models to acquire general knowledge, bringing us a step closer to general AI. We also showed that reducing the number of parameters in pre-trained LLMs does not have a great impact as the performance drops slightly when using T5-Small instead of T5-Base. In fact, when using only 2\% of the parameters, we achieved a great improvement compared to training from scratch. Finally, in contrast to prior work, we find out that using pre-trained embeddings for the input layer is necessary to achieve the desired results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. Dong, Linhao, Shuang Xu, and Bo Xu. “Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018.
  2. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. “Improving Language Understanding by Generative Pre-training”.
  3. Lu, K., Grover, A., Abbeel, P., & Mordatch, I. (2021). “Pretrained transformers as universal computation engines.” arXiv preprint arXiv:2103.05247.
  4. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). “Imagenet: A large-scale hierarchical image database.” In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
  5. Tida, Vijay Srinivas, and Sonya Hsu. “Universal Spam Detection using Transfer Learning of BERT Model.” arXiv preprint arXiv:2202.03480 (2022).
  6. Azzouza, Noureddine, Karima Akli-Astouati, and Roliana Ibrahim. “Twitterbert: Framework for twitter sentiment analysis based on pre-trained language model representations.” International Conference of Reliable Information and Communication Technology. Springer, Cham, 2019.
  7. Nogueira, R., Jiang, Z., & Lin, J. (2021). “Investigating the limitations of transformers with simple arithmetic tasks.” arXiv preprint arXiv:2102.13019.
  8. Hu, Ronghang, and Amanpreet Singh. “Unit: Multimodal multitask learning with a unified transformer.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
  9. Kaiser, L., Gomez, A. N., Shazeer, N., Vaswani, A., Parmar, N., Jones, L., & Uszkoreit, J. (2017). “One model to learn them all.” arXiv preprint arXiv:1706.05137.
  10. Nangia, Nikita, and Samuel R. Bowman. “Listops: A diagnostic dataset for latent tree learning.” arXiv preprint arXiv:1804.06028 (2018).
  11. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
  12. Beltagy, Iz, Matthew E. Peters, and Arman Cohan. “Longformer: The long-document transformer.” arXiv preprint arXiv:2004.05150 (2020).
  13. Wang, S., Li, B. Z., Khabsa, M., Fang, H., & Ma, H. (2020). “Linformer: Self-attention with linear complexity.” arXiv preprint arXiv:2006.04768.
  14. Kitaev, Nikita, Łukasz Kaiser, and Anselm Levskaya. “Reformer: The efficient transformer.” arXiv preprint arXiv:2001.04451 (2020).
  15. Child, R., Gray, S., Radford, A., & Sutskever, I. (2019). “Generating long sequences with sparse transformers.” arXiv preprint arXiv:1904.10509.
  16. Rebuffi, Sylvestre-Alvise, Hakan Bilen, and Andrea Vedaldi. “Learning multiple visual domains with residual adapters.” Advances in neural information processing systems 30 (2017).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Mohamad Ballout (7 papers)
  2. Ulf Krumnack (11 papers)
  3. Gunther Heidemann (8 papers)
  4. Kai-Uwe Kühnberger (13 papers)
Citations (2)