2000 character limit reached
Counting in Language with RNNs (1810.12411v2)
Published 29 Oct 2018 in cs.LG, cs.NE, and stat.ML
Abstract: In this paper we examine a possible reason for the LSTM outperforming the GRU on LLMing and more specifically machine translation. We hypothesize that this has to do with counting. This is a consistent theme across the literature of long term dependence, counting, and LLMing for RNNs. Using the simplified forms of language -- Context-Free and Context-Sensitive Languages -- we show how exactly the LSTM performs its counting based on their cell states during inference and why the GRU cannot perform as well.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run paper prompts using GPT-5.