Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BloombergGPT: A Large Language Model for Finance (2303.17564v3)

Published 30 Mar 2023 in cs.LG, cs.AI, cs.CL, and q-fin.GN
BloombergGPT: A Large Language Model for Finance

Abstract: The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. LLMs have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter LLM that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.

Overview of BloombergGPT: A LLM for Finance

The paper presents BloombergGPT, a domain-specific LLM tailored for the financial sector. With 50 billion parameters, BloombergGPT is trained on 363 billion tokens from Bloomberg's extensive financial dataset, representing perhaps one of the most substantial domain-specific datasets available. Additionally, it includes 345 billion tokens from general-purpose datasets. This dual dataset strategy enables BloombergGPT to achieve superior performance on financial tasks while maintaining competitiveness on general NLP benchmarks.

Dataset and Methodology

The authors introduce "FinPile," a dataset derived from Bloomberg's archival financial documents, covering different data types such as news, filings, press releases, and social media. The creation of FinPile emphasized data quality, including the de-duplication of core datasets like C4 and The Pile, reinforced by public datasets to ensure a balanced dataset with over 700 billion tokens.

The model utilizes a BLOOM-style architecture with a focus on achieving a balance between domain-specific and general tasks. By adhering to principles outlined in Chinchilla's scaling laws, the training process optimizes resource usage, allowing the construction of a model that performs effectively within the compute constraints.

Key Results and Evaluation

BloombergGPT's evaluation involved two categories of tasks: financial-specific and general-purpose. For the financial tasks, the model dramatically outperformed existing models, particularly in tasks like sentiment analysis and financial question answering. Notably, it surpassed models such as GPT-NeoX and OPT on typical financial-phased tasks.

For general-purpose tasks, BloombergGPT was assessed using BIG-bench Hard and MMLU benchmarks, among others. Despite its specialization, the model retained capabilities comparable to larger general-purpose models, showcasing the effectiveness of its mixed training strategy.

Strong Claims and Implications

The most prominent claim is BloombergGPT's unparalleled performance in financially related NLP tasks, supported by its unique access to a massive, well-curated financial corpus. This capability suggests a promising direction for domain-specific models, where appropriately combining domain-specific and generic data can yield optimal results without compromising on general applicability.

The paper highlights the implications of deploying BloombergGPT in financial sectors, suggesting profound benefits for time-sensitive financial analytics. Moreover, this approach underscores the potential for similar applications in other sectors, emphasizing the significance of harnessing domain-appropriate datasets alongside general ones.

Future Perspectives

The research opens up avenues for exploring task-specific fine-tuning and evaluating domain-specialization across different sectors. Furthermore, the paper alludes to potential configurations of tokenization strategies, positioning the model to better adapt numerically dense domains.

BloombergGPT serves as a benchmark for future LLM developments within international finance, setting the standard for both procedural documentation ('Training Chronicles') and the marriage of domain-specific knowledge with advanced NLP techniques.

The insights gleaned inspire further exploration into the craft of model alignment, notably in sensitive financial environments. Additionally, the research provides a strong case for scrutinizing tokenization processes and their impact on training outcomes, setting the groundwork for future advancements in LLM training methodologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (141)
  1. Dogu Araci. Finbert: Financial sentiment analysis with pre-trained language models. arXiV preprint arXiV:1908.10063, 2019.
  2. PLATO-XL: Exploring the large-scale pre-training of dialogue generation. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 107–118, Online only, November 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-aacl.10.
  3. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1371. URL https://aclanthology.org/D19-1371.
  4. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021.
  5. The fifth PASCAL recognizing textual entailment challenge. In Proceedings of the Second Text Analysis Conference, TAC 2009, Gaithersburg, Maryland, USA, November 16-17, 2009. NIST, 2009. URL https://tac.nist.gov/publications/2009/additional.papers/RTE5_overview.proceedings.pdf.
  6. The values encoded in machine learning research. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 173–184, 2022.
  7. PIQA: reasoning about physical commonsense in natural language. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 7432–7439. AAAI Press, 2020. URL https://ojs.aaai.org/index.php/AAAI/article/view/6239.
  8. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, March 2021. URL https://doi.org/10.5281/zenodo.5297715. If you use this software, please cite it using these metadata.
  9. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pages 95–136, virtual+Dublin, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.bigscience-1.9. URL https://aclanthology.org/2022.bigscience-1.9.
  10. BioMedLM. https://github.com/stanford-crfm/BioMedLM, 2023.
  11. On the opportunities and risks of foundation models. ArXiV, abs/2108.07258, 2021.
  12. Byte pair encoding is suboptimal for language model pretraining. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4617–4624, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.414. URL https://aclanthology.org/2020.findings-emnlp.414.
  13. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 858–867, Prague, Czech Republic, June 2007. Association for Computational Linguistics. URL https://aclanthology.org/D07-1090.
  14. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–480, 1992.
  15. Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
  16. Extracting training data from large language models. In USENIX Security Symposium, 2020.
  17. Quantifying memorization across neural language models, 2022. URL https://arxiv.org/abs/2202.07646.
  18. Evaluating large language models trained on code. arXiV, abs/2107.03374, 2021a.
  19. Training deep nets with sublinear memory cost. arXiV preprint arXiV:1604.06174, 2016.
  20. FinQA: A dataset of numerical reasoning over financial data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3697–3711, Online and Punta Cana, Dominican Republic, November 2021b. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.300. URL https://aclanthology.org/2021.emnlp-main.300.
  21. ConvFinQA: Exploring the chain of numerical reasoning in conversational finance question answering. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6279–6292, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.421.
  22. Palm: Scaling language modeling with pathways. arXiV, abs/2204.02311, 2022.
  23. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2924–2936, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1300. URL https://aclanthology.org/N19-1300.
  24. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiV, abs/1803.05457, 2018.
  25. The pascal recognising textual entailment challenge. In Machine Learning Challenges Workshop, 2007.
  26. The commitmentbank: Investigating projection in naturally occurring discourse. In proceedings of Sinn und Bedeutung, pages 107–124, 2019.
  27. Bernice: A multilingual pre-trained encoder for Twitter. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6191–6205, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.415.
  28. 8-bit optimizers via block-wise quantization. In International Conference on Learning Representations, 2022.
  29. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  30. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1286–1305, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.98. URL https://aclanthology.org/2021.emnlp-main.98.
  31. How twitter is changing the nature of financial news discovery. In proceedings of the second international workshop on data science for macro-modeling, pages 1–5, 2016.
  32. Natural language processing in accounting, auditing and finance: A synthesis of the literature with a roadmap for future research. Intelligent Systems in Accounting, Finance and Management, 23(3):157–214, 2016.
  33. The pile: An 800gb dataset of diverse text for language modeling, 2021. URL https://arxiv.org/abs/2101.00027.
  34. Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text, 2022. URL https://arxiv.org/abs/2202.06935.
  35. The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 1–9, Prague, June 2007. Association for Computational Linguistics. URL https://aclanthology.org/W07-1401.
  36. Improving alignment of dialogue agents via targeted human judgements, 2022. URL https://arxiv.org/abs/2209.14375.
  37. Semeval-2012 task 7: Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In International Workshop on Semantic Evaluation, 2011.
  38. News summarization and evaluation in the era of gpt-3, 2022. URL https://arxiv.org/abs/2209.12356.
  39. Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.740. URL https://aclanthology.org/2020.acl-main.740.
  40. The second pascal recognising textual entailment challenge. In Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment, volume 7, 2006.
  41. Gaussian error linear units (gelus). arXiV preprint arXiV:1606.08415, 2016.
  42. Measuring massive multitask language understanding. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=d7KBjmI3GmQ.
  43. Query-key normalization for transformers. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4246–4253, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.379. URL https://aclanthology.org/2020.findings-emnlp.379.
  44. Scaling laws for transfer. arXiV preprint arXiV:2102.01293, 2021.
  45. An empirical analysis of compute-optimal large language model training. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=iBBcRUlOAPR.
  46. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 328–339, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1031. URL https://aclanthology.org/P18-1031.
  47. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiV, 4 2019. URL http://arxiv.org/abs/1904.05342.
  48. Frederick Jelinek. Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4):532–556, 1976.
  49. Data governance in the age of large-scale data-driven language technology. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, jun 2022. doi: 10.1145/3531146.3534637. URL https://doi.org/10.1145%2F3531146.3534637.
  50. Scaling laws for neural language models. arXiV, 1 2020. URL http://arxiv.org/abs/2001.08361.
  51. Amazon sagemaker model parallelism: A general and flexible framework for large model training. arXiV preprint arXiV:2111.05972, 2021.
  52. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 252–262, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1023. URL https://aclanthology.org/N18-1023.
  53. Reducing activation recomputation in large transformer models, 2022. URL https://arxiv.org/abs/2205.05198.
  54. Taku Kudo. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 66–75, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1007. URL https://aclanthology.org/P18-1007.
  55. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-2012. URL https://aclanthology.org/D18-2012.
  56. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 785–794, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1082. URL https://aclanthology.org/D17-1082.
  57. What language model to train if you have one million GPU hours? In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 765–782, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-emnlp.54.
  58. Biobert: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36:1234–1240, 2 2020. ISSN 14602059. doi: 10.1093/bioinformatics/btz682.
  59. Deduplicating training data makes language models better. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8424–8445, Dublin, Ireland, May 2022a. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.577. URL https://aclanthology.org/2022.acl-long.577.
  60. Evaluating human-language model interaction. CoRR, abs/2212.09746, 2022b. doi: 10.48550/arXiv.2212.09746. URL https://doi.org/10.48550/arXiv.2212.09746.
  61. Do we still need clinical language models?, 2023. URL https://arxiv.org/abs/2302.08091.
  62. The winograd schema challenge. In International Conference on Principles of Knowledge Representation and Reasoning, 2011.
  63. Limits to depth efficiencies of self-attention. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 22640–22651. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/ff4dfdf5904e920ce52b48c1cef97829-Paper.pdf.
  64. Pretrained language models for biomedical and clinical tasks: Understanding and extending the state-of-the-art. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, pages 146–157, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.clinicalnlp-1.17. URL https://aclanthology.org/2020.clinicalnlp-1.17.
  65. Solving quantitative reasoning problems with language models, 2022. URL https://arxiv.org/abs/2206.14858.
  66. Holistic evaluation of language models. CoRR, abs/2211.09110, 2022. doi: 10.48550/arXiv.2211.09110. URL https://doi.org/10.48550/arXiv.2211.09110.
  67. Jurassic-1: Technical details and evaluation. White Paper. AI21 Labs, 1, 2021.
  68. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022. doi: 10.1101/2022.07.20.500902. URL https://www.biorxiv.org/content/early/2022/07/21/2022.07.20.500902.
  69. Autoregressive structured prediction with language models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 993–1005, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-emnlp.70.
  70. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
  71. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6), sep 2022. doi: 10.1093/bib/bbac409. URL https://doi.org/10.1093%2Fbib%2Fbbac409.
  72. Exploring cross-sentence contexts for named entity recognition with BERT. In Proceedings of the 28th International Conference on Computational Linguistics, pages 904–914, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.78. URL https://aclanthology.org/2020.coling-main.78.
  73. Www’18 open challenge: Financial opinion mining and question answering. In Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis, editors, Companion of the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon , France, April 23-27, 2018, pages 1941–1942. ACM, 2018. doi: 10.1145/3184558.3192301. URL https://doi.org/10.1145/3184558.3192301.
  74. Good debt or bad debt: Detecting semantic orientations in economic texts. J. Assoc. Inf. Sci. Technol., 65(4):782–796, 2014. doi: 10.1002/asi.23062. URL https://doi.org/10.1002/asi.23062.
  75. Between words and characters: A brief history of open-vocabulary modeling and tokenization in nlp, 2021. URL https://arxiv.org/abs/2112.10508.
  76. Can a suit of armor conduct electricity? a new dataset for open book question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2381–2391, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1260. URL https://aclanthology.org/D18-1260.
  77. Recurrent neural network based language model. In Interspeech, pages 1045–1048. Makuhari, 2010.
  78. A corpus and cloze evaluation for deeper understanding of commonsense stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 839–849, San Diego, California, June 2016. Association for Computational Linguistics. doi: 10.18653/v1/N16-1098. URL https://aclanthology.org/N16-1098.
  79. BERTweet: A pre-trained language model for English tweets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 9–14, Online, October 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.2. URL https://aclanthology.org/2020.emnlp-demos.2.
  80. Adversarial NLI: A new benchmark for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4885–4901, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.441. URL https://aclanthology.org/2020.acl-main.441.
  81. Progen2: Exploring the boundaries of protein language models. CoRR, abs/2206.13517, 2022. doi: 10.48550/arXiv.2206.13517. URL https://doi.org/10.48550/arXiv.2206.13517.
  82. NVIDIA. Train with mixed precision, 2023. URL https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html.
  83. Training language models to follow instructions with human feedback. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=TG8KACxEON.
  84. Godel: Large-scale pre-training for goal-directed dialog. arXiV preprint arXiV:2206.11309, 2022.
  85. WiC: the word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1267–1273, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1128. URL https://aclanthology.org/N19-1128.
  86. Train short, test long: Attention with linear biases enables input length extrapolation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=R8sQPpGCv0.
  87. Improving language understanding by generative pre-training, 2018. URL https://gluebenchmark.com/leaderboard.
  88. Language models are unsupervised multitask learners, 2019. URL https://github.com/codelucas/newspaper.
  89. Compressive transformers for long-range sequence modelling. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=SylKikSYDH.
  90. Scaling language models: Methods, analysis & insights from training gopher. arXiV, 12 2021. URL http://arxiv.org/abs/2112.11446.
  91. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
  92. Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–16. IEEE, 2020.
  93. Recipes for building an open-domain chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 300–325, Online, April 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-main.24. URL https://aclanthology.org/2021.eacl-main.24.
  94. WINOGRANDE: An adversarial winograd schema challenge at scale. Commun. ACM, 64:99–106, 2019.
  95. Domain adaption of named entity recognition to support credit risk assessment. In Proceedings of the Australasian Language Technology Association Workshop 2015, pages 84–90, Parramatta, Australia, December 2015. URL https://aclanthology.org/U15-1010.
  96. Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=9Vrb9D0WI4.
  97. Bloom: A 176b-parameter open-access multilingual language model. arXiV, 11 2022. URL http://arxiv.org/abs/2211.05100.
  98. Japanese and korean voice search. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5149–5152. IEEE, 2012.
  99. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/v1/P16-1162. URL https://aclanthology.org/P16-1162.
  100. When FLUE meets FLANG: Benchmarks and large pretrained language model for financial domain. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2322–2335, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.148.
  101. Noam Shazeer. GLU variants improve transformer. arXiV preprint arXiV:2002.05202, 2020.
  102. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiV preprint arXiV:1909.08053, 2019.
  103. Large language models encode clinical knowledge, 2022. URL https://arxiv.org/abs/2212.13138.
  104. Impact of news on the commodity market: Dataset and results. CoRR, abs/2009.04202, 2020. URL https://arxiv.org/abs/2009.04202.
  105. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model, 2022. URL https://arxiv.org/abs/2201.11990.
  106. Alexatm 20b: Few-shot learning using a large-scale multilingual seq2seq model. arXiV, abs/2208.01448, 2022.
  107. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv, abs/2206.04615, 2022.
  108. Roformer: Enhanced transformer with rotary position embedding. arXiV preprint arXiV:2104.09864, 2021a.
  109. Roformer: Enhanced transformer with rotary position embedding. CoRR, abs/2104.09864, 2021b. URL https://arxiv.org/abs/2104.09864.
  110. Generating text with recurrent neural networks. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 1017–1024, 2011.
  111. Challenging big-bench tasks and whether chain-of-thought can solve them. CoRR, abs/2210.09261, 2022. doi: 10.48550/arXiv.2210.09261. URL https://doi.org/10.48550/arXiv.2210.09261.
  112. General-purpose question-answering with macaw. arXiV preprint arXiV:2109.02593, 2021.
  113. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1421. URL https://aclanthology.org/N19-1421.
  114. Scale efficiently: Insights from pre-training and fine-tuning transformers. arXiV preprint arXiV:2109.10686, 2021.
  115. Scaling laws vs model architectures: How does inductive bias influence scaling? arXiV preprint arXiV:2207.10551, 2022a.
  116. Ul2: Unifying language learning paradigms, 2022b. URL https://arxiv.org/abs/2205.05131.
  117. Galactica: A large language model for science. arXiV, 11 2022. URL http://arxiv.org/abs/2211.09085.
  118. Lamda: Language models for dialog applications, 2022. URL https://arxiv.org/abs/2201.08239.
  119. Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 142–147, 2003. URL https://aclanthology.org/W03-0419.
  120. LLaMA: Open and efficient foundation language models, 2023. URL https://arxiv.org/abs/2302.13971.
  121. Best practices for managing data annotation projects, 2020. URL http://rgdoi.net/10.13140/RG.2.2.34497.58727.
  122. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  123. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
  124. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.685. URL https://aclanthology.org/2021.emnlp-main.685.
  125. Finetuned language models are zero-shot learners, 2021. URL https://arxiv.org/abs/2109.01652.
  126. Emergent abilities of large language models. Transactions on Machine Learning Research (TMLR), 2022a. doi: 10.48550/ARXIV.2206.07682. URL https://arxiv.org/abs/2206.07682.
  127. Chain of thought prompting elicits reasoning in large language models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022b. URL https://openreview.net/forum?id=_VjQlMeSB_J.
  128. Ethical and social risks of harm from language models, 2021. URL https://arxiv.org/abs/2112.04359.
  129. Taxonomy of risks posed by language models. 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022.
  130. Challenges in detoxifying language models. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2447–2469, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-emnlp.210. URL https://aclanthology.org/2021.findings-emnlp.210.
  131. Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 833–844, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1077. URL https://aclanthology.org/D19-1077.
  132. Google’s neural machine translation system: Bridging the gap between human and machine translation. ArXiV, abs/1609.08144, 2016.
  133. Modeling protein using large-scale pretrain language model. CoRR, abs/2108.07435, 2021. URL https://arxiv.org/abs/2108.07435.
  134. Natural language based financial forecasting: a survey. Artificial Intelligence Review, 50(1):49–73, 2018.
  135. Detoxifying language models risks marginalizing minority voices. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2390–2397, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.190. URL https://aclanthology.org/2021.naacl-main.190.
  136. HellaSwag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791–4800, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1472. URL https://aclanthology.org/P19-1472.
  137. Glm-130b: An open bilingual pre-trained model. arXiV, 10 2022. URL http://arxiv.org/abs/2210.02414.
  138. Record: Bridging the gap between human and machine commonsense reading comprehension. arXiV, abs/1810.12885, 2018.
  139. Opt: Open pre-trained transformer language models. arXiV, 5 2022a. URL http://arxiv.org/abs/2205.01068.
  140. DIALOGPT : Large-scale generative pre-training for conversational response generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 270–278, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-demos.30. URL https://aclanthology.org/2020.acl-demos.30.
  141. Mics: Near-linear scaling for training gigantic model on public cloud, 2022b. URL https://arxiv.org/abs/2205.00119.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Shijie Wu (23 papers)
  2. Ozan Irsoy (22 papers)
  3. Steven Lu (16 papers)
  4. Vadim Dabravolski (1 paper)
  5. Mark Dredze (66 papers)
  6. Sebastian Gehrmann (48 papers)
  7. Prabhanjan Kambadur (9 papers)
  8. David Rosenberg (12 papers)
  9. Gideon Mann (5 papers)
Citations (613)
Youtube Logo Streamline Icon: https://streamlinehq.com