Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Probing Across Time: What Does RoBERTa Know and When? (2104.07885v2)

Published 16 Apr 2021 in cs.CL

Abstract: Models of language trained on very large corpora have been demonstrated useful for NLP. As fixed artifacts, they have become the object of intense study, with many researchers "probing" the extent to which linguistic abstractions, factual and commonsense knowledge, and reasoning abilities they acquire and readily demonstrate. Building on this line of work, we consider a new question: for types of knowledge a LLM learns, when during (pre)training are they acquired? We plot probing performance across iterations, using RoBERTa as a case study. Among our findings: linguistic knowledge is acquired fast, stably, and robustly across domains. Facts and commonsense are slower and more domain-sensitive. Reasoning abilities are, in general, not stably acquired. As new datasets, pretraining protocols, and probes emerge, we believe that probing-across-time analyses can help researchers understand the complex, intermingled learning that these models undergo and guide us toward more efficient approaches that accomplish necessary learning faster.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Leo Z. Liu (4 papers)
  2. Yizhong Wang (42 papers)
  3. Jungo Kasai (38 papers)
  4. Hannaneh Hajishirzi (176 papers)
  5. Noah A. Smith (224 papers)
Citations (78)

Summary

We haven't generated a summary for this paper yet.