IGLOO: Slicing the Features Space to Represent Sequences

Published 9 Jul 2018 in cs.LG and stat.ML | (1807.03402v3)

Abstract: Historically, Recurrent neural networks (RNNs) and its variants such as LSTM and GRU and more recently Transformers have been the standard go-to components when processing sequential data with neural networks. One notable issue is the relative difficulty to deal with long sequences (i.e. more than 20,000 steps). We introduce IGLOO, a new neural network architecture which aims at being efficient for short sequences but also at being able to deal with long sequences. IGLOOs core idea is to use the relationships between non-local patches sliced out of the features maps of successively applied convolutions to build a representation for the sequence. We show that the model can deal with dependencies of more than 20,000 steps in a reasonable time frame. We stress test IGLOO on the copy-memory and addition tasks, as well as permuted MNIST (98.4%). For a larger task we apply this new structure to the Wikitext-2 dataset Merity et al. (2017b) and achieve a perplexity in line with baseline Transformers but lower than baseline AWD-LSTM. We also present how IGLOO is already used today in production for bioinformatics tasks.