Bidirectional Long-Range Parser for Sequential Data Understanding
Abstract: The transformer is a powerful data modelling framework responsible for remarkable performance on a wide range of tasks. However, they are limited in terms of scalability as it is suboptimal and inefficient to process long-sequence data. To this purpose we introduce BLRP (Bidirectional Long-Range Parser), a novel and versatile attention mechanism designed to increase performance and efficiency on long-sequence tasks. It leverages short and long range heuristics in the form of a local sliding window approach combined with a global bidirectional latent space synthesis technique. We show the benefits and versatility of our approach on vision and language domains by demonstrating competitive results against state-of-the-art methods on the Long-Range-Arena and CIFAR benchmarks together with ablations demonstrating the computational efficiency.
- Docformer: End-to-end transformer for document understanding. arXiv preprint arXiv:2106.11539, 2021.
- Unitary evolution recurrent neural networks. In ICML, pages 1120–1128. PMLR, 2016.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
- End-to-end object detection with transformers. In ECCV, pages 213–229. Springer, 2020.
- Constrained transformer network for ecg signal processing and arrhythmia classification. BMC Medical Informatics and Decision Making, 21(1):1–13, 2021.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation. In EMNLP, 2014.
- Rethinking attention with performers. In International Conference on Learning Representations, 2021.
- Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860, 2019.
- Temporal latent bottleneck: Synthesis of fast and slow processing mechanisms in sequence learning. In NeurIPS, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- How to train your hippo: State space models with generalized orthogonal basis projections. arXiv preprint arXiv:2206.12037, 2022.
- Liquid structural state-space models. arXiv preprint arXiv:2209.12951, 2022.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- A transformer-based deep neural network for arrhythmia detection using continuous ecg signals. Computers in Biology and Medicine, 144:105325, 2022.
- Encoding recurrence into transformers. In The Eleventh International Conference on Learning Representations, 2023.
- Block-recurrent transformers. In NeurIPS, 2022.
- Learning multiple layers of features from tiny images. 2009.
- Watermark text pattern spotting in document images. arXiv preprint arXiv:2401.05167, 2024.
- Selfdoc: Self-supervised document representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5652–5660, 2021.
- Swin transformer v2: Scaling up capacity and resolution. In CVPR, pages 12009–12019, 2022.
- Decoupled weight decay regularization. In ICLR, 2019.
- k-nn embeded space conditioning for enhanced few-shot object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 401–410, 2023.
- James R Norris. Markov chains. Number 2. Cambridge university press, 1998.
- Resurrecting recurrent neural networks for long sequences. arXiv preprint arXiv:2303.06349, 2023.
- Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, NeurIPS. 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Compressive transformers for long-range sequence modelling. arXiv preprint arXiv:1911.05507, 2019.
- Large sequence representation learning via multi-stage latent transformers. In COLING, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics.
- Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673–2681, 1997.
- Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933, 2022.
- Unstructured object matching using co-salient region segmentation. In CVPR, pages 5051–5060, 2022.
- Long range arena: A benchmark for efficient transformers. In ICLR, 2021a.
- Charformer: Fast character transformers via gradient-based subword tokenization. arXiv preprint arXiv:2106.12672, 2021b.
- Attention is all you need. In NIPS, 2017.
- Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768, 2020.
- Lite transformer with long-short range attention. arXiv preprint arXiv:2004.11886, 2020.
- Nyströmformer: A nyström-based algorithm for approximating self-attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14138–14148, 2021.
- Layoutlmv2: Multi-modal pre-training for visually-rich document understanding. arXiv preprint arXiv:2012.14740, 2020a.
- Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, page 1192–1200, 2020b.
- Big bird: Transformers for longer sequences. In NeurIPS, 2020.
- Multi-scale vision longformer: A new vision transformer for high-resolution image encoding. In CVPR, pages 2998–3008, 2021.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In AAAI, volume 35, pages 11106–11115, 2021.
- Long-short transformer: Efficient transformers for language and vision. NeurIPS, 34:17723–17736, 2021.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.