Where exactly does contextualization in a PLM happen? (2312.06514v1)

Published 11 Dec 2023 in cs.CL and cs.AI

Abstract: Pre-trained LLMs (PLMs) have shown to be consistently successful in a plethora of NLP tasks due to their ability to learn contextualized representations of words (Ethayarajh, 2019). BERT (Devlin et al., 2018), ELMo (Peters et al., 2018) and other PLMs encode word meaning via textual context, as opposed to static word embeddings, which encode all meanings of a word in a single vector representation. In this work, we present a study that aims to localize where exactly in a PLM word contextualization happens. In order to find the location of this word meaning transformation, we investigate representations of polysemous words in the basic BERT uncased 12 layer architecture (Devlin et al., 2018), a masked LLM trained on an additional sentence adjacency objective, using qualitative and quantitative measures.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (12)

Authors (4)

Soniya Vijayakumar (4 papers)
Tanja Bäumel (2 papers)
Simon Ostermann (26 papers)
Josef van Genabith (43 papers)

Citations (1)

View on Semantic Scholar

Where exactly does contextualization in a PLM happen? (2312.06514v1)

Related Papers