Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Universality of Deep Contextual Language Models (2109.07140v2)

Published 15 Sep 2021 in cs.CL

Abstract: Deep Contextual LLMs (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning. Furthermore, multilingual versions of such models like XLM-R and mBERT have given promising results in zero-shot cross-lingual transfer, potentially enabling NLP applications in many under-served and under-resourced languages. Due to this initial success, pre-trained models are being used as Universal LLMs' as the starting point across diverse tasks, domains, and languages. This work explores the notion ofUniversality' by identifying seven dimensions across which a universal model should be able to scale, that is, perform equally well or reasonably well, to be useful across diverse settings. We outline the current theoretical and empirical results that support model performance across these dimensions, along with extensions that may help address some of their current limitations. Through this survey, we lay the foundation for understanding the capabilities and limitations of massive contextual LLMs and help discern research gaps and directions for future work to make these LMs inclusive and fair to diverse applications, users, and linguistic phenomena.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shaily Bhatt (8 papers)
  2. Poonam Goyal (2 papers)
  3. Sandipan Dandapat (17 papers)
  4. Monojit Choudhury (66 papers)
  5. Sunayana Sitaram (54 papers)
Citations (5)
Youtube Logo Streamline Icon: https://streamlinehq.com