2000 character limit reached
Better Character Language Modeling Through Morphology (1906.01037v2)
Published 3 Jun 2019 in cs.CL
Abstract: We incorporate morphological supervision into character LLMs (CLMs) via multitasking and show that this addition improves bits-per-character (BPC) performance across 24 languages, even when the morphology data and LLMing data are disjoint. Analyzing the CLMs shows that inflected words benefit more from explicitly modeling morphology than uninflected words, and that morphological supervision improves performance even as the amount of LLMing data grows. We then transfer morphological supervision across languages to improve LLMing performance in the low-resource setting.
- Terra Blevins (20 papers)
- Luke Zettlemoyer (225 papers)