Papers
Topics
Authors
Recent
2000 character limit reached

Bidirectional Long-Range Parser for Sequential Data Understanding

Published 8 Apr 2024 in cs.CV, cs.CL, and cs.LG | (2404.05210v1)

Abstract: The transformer is a powerful data modelling framework responsible for remarkable performance on a wide range of tasks. However, they are limited in terms of scalability as it is suboptimal and inefficient to process long-sequence data. To this purpose we introduce BLRP (Bidirectional Long-Range Parser), a novel and versatile attention mechanism designed to increase performance and efficiency on long-sequence tasks. It leverages short and long range heuristics in the form of a local sliding window approach combined with a global bidirectional latent space synthesis technique. We show the benefits and versatility of our approach on vision and language domains by demonstrating competitive results against state-of-the-art methods on the Long-Range-Arena and CIFAR benchmarks together with ablations demonstrating the computational efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Docformer: End-to-end transformer for document understanding. arXiv preprint arXiv:2106.11539, 2021.
  2. Unitary evolution recurrent neural networks. In ICML, pages 1120–1128. PMLR, 2016.
  3. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
  4. End-to-end object detection with transformers. In ECCV, pages 213–229. Springer, 2020.
  5. Constrained transformer network for ecg signal processing and arrhythmia classification. BMC Medical Informatics and Decision Making, 21(1):1–13, 2021.
  6. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In EMNLP, 2014.
  7. Rethinking attention with performers. In International Conference on Learning Representations, 2021.
  8. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860, 2019.
  9. Temporal latent bottleneck: Synthesis of fast and slow processing mechanisms in sequence learning. In NeurIPS, 2022.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  11. How to train your hippo: State space models with generalized orthogonal basis projections. arXiv preprint arXiv:2206.12037, 2022.
  12. Liquid structural state-space models. arXiv preprint arXiv:2209.12951, 2022.
  13. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  14. A transformer-based deep neural network for arrhythmia detection using continuous ecg signals. Computers in Biology and Medicine, 144:105325, 2022.
  15. Encoding recurrence into transformers. In The Eleventh International Conference on Learning Representations, 2023.
  16. Block-recurrent transformers. In NeurIPS, 2022.
  17. Learning multiple layers of features from tiny images. 2009.
  18. Watermark text pattern spotting in document images. arXiv preprint arXiv:2401.05167, 2024.
  19. Selfdoc: Self-supervised document representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5652–5660, 2021.
  20. Swin transformer v2: Scaling up capacity and resolution. In CVPR, pages 12009–12019, 2022.
  21. Decoupled weight decay regularization. In ICLR, 2019.
  22. k-nn embeded space conditioning for enhanced few-shot object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 401–410, 2023.
  23. James R Norris. Markov chains. Number 2. Cambridge university press, 1998.
  24. Resurrecting recurrent neural networks for long sequences. arXiv preprint arXiv:2303.06349, 2023.
  25. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, NeurIPS. 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
  26. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  27. Compressive transformers for long-range sequence modelling. arXiv preprint arXiv:1911.05507, 2019.
  28. Large sequence representation learning via multi-stage latent transformers. In COLING, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics.
  29. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673–2681, 1997.
  30. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933, 2022.
  31. Unstructured object matching using co-salient region segmentation. In CVPR, pages 5051–5060, 2022.
  32. Long range arena: A benchmark for efficient transformers. In ICLR, 2021a.
  33. Charformer: Fast character transformers via gradient-based subword tokenization. arXiv preprint arXiv:2106.12672, 2021b.
  34. Attention is all you need. In NIPS, 2017.
  35. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768, 2020.
  36. Lite transformer with long-short range attention. arXiv preprint arXiv:2004.11886, 2020.
  37. Nyströmformer: A nyström-based algorithm for approximating self-attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14138–14148, 2021.
  38. Layoutlmv2: Multi-modal pre-training for visually-rich document understanding. arXiv preprint arXiv:2012.14740, 2020a.
  39. Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, page 1192–1200, 2020b.
  40. Big bird: Transformers for longer sequences. In NeurIPS, 2020.
  41. Multi-scale vision longformer: A new vision transformer for high-resolution image encoding. In CVPR, pages 2998–3008, 2021.
  42. Informer: Beyond efficient transformer for long sequence time-series forecasting. In AAAI, volume 35, pages 11106–11115, 2021.
  43. Long-short transformer: Efficient transformers for language and vision. NeurIPS, 34:17723–17736, 2021.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.