Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Comparison of Pre-training Language Models (2106.11483v9)

Published 22 Jun 2021 in cs.CL

Abstract: Recently, the development of pre-trained LLMs has brought NLP tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained LLMs. We pre-train a list of transformer-based models with the same amount of text and the same training steps. The experimental results shows that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for short text understanding. But the conclusion is: There are no remarkable improvement for short text understanding for similar BERT structures. Data-centric method[12] can achieve better performance.

Citations (2)

Summary

We haven't generated a summary for this paper yet.