Learning language variations in news corpora through differential embeddings (2011.06949v1)

Published 13 Nov 2020 in cs.CL and cs.LG

Abstract: There is an increasing interest in the NLP community in capturing variations in the usage of language, either through time (i.e., semantic drift), across regions (as dialects or variants) or in different social contexts (i.e., professional or media technolects). Several successful dynamical embeddings have been proposed that can track semantic change through time. Here we show that a model with a central word representation and a slice-dependent contribution can learn word embeddings from different corpora simultaneously. This model is based on a star-like representation of the slices. We apply it to The New York Times and The Guardian newspapers, and we show that it can capture both temporal dynamics in the yearly slices of each corpus, and language variations between US and UK English in a curated multi-source corpus. We provide an extensive evaluation of this methodology.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Learning language variations in news corpora through differential embeddings (2011.06949v1)

Summary

Related Papers