Papers
Topics
Authors
Recent
2000 character limit reached

Metadata Might Make Language Models Better (2211.10086v1)

Published 18 Nov 2022 in cs.CL and cs.DL

Abstract: This paper discusses the benefits of including metadata when training LLMs on historical collections. Using 19th-century newspapers as a case study, we extend the time-masking approach proposed by Rosin et al., 2022 and compare different strategies for inserting temporal, political and geographical information into a Masked LLM. After fine-tuning several DistilBERT on enhanced input data, we provide a systematic evaluation of these models on a set of evaluation tasks: pseudo-perplexity, metadata mask-filling and supervised classification. We find that showing relevant metadata to a LLM has a beneficial impact and may even produce more robust and fairer models.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.