Algebraic Positional Encodings
Abstract: We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. Our framework provides a flexible mapping from the algebraic specification of a domain to an interpretation as orthogonal operators. This design preserves the algebraic characteristics of the source domain, ensuring that the model upholds its desired structural properties. Our scheme can accommodate various structures, ncluding sequences, grids and trees, as well as their compositions. We conduct a series of experiments to demonstrate the practical applicability of our approach. Results suggest performance on par with or surpassing the current state-of-the-art, without hyper-parameter optimizations or "task search" of any kind. Code is available at https://github.com/konstantinosKokos/ape.
- Unitary evolution recurrent neural networks. In International conference on machine learning, pages 1120–1128. PMLR.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.
- Position information in transformers: An overview. Computational Linguistics, 48(3):733–763.
- Convolutional sequence to sequence learning. In International conference on machine learning, pages 1243–1252. PMLR.
- Escaping the big data paradigm with compact transformers. arXiv preprint arXiv:2104.05704.
- Theo Janssen. 2014. Foundations and applications of Montague grammar. Ph.D. thesis, University of Amsterdam. Originally published: April 1983 (UvA).
- Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning, pages 5156–5165. PMLR.
- Rethinking positional encoding in language pre-training. In International Conference on Learning Representations.
- Learning multiple layers of features from tiny images.
- Gated graph sequence neural networks. In Proceedings of ICLR’16.
- Positional encodings as group representations: A unified framework.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- Soft: Softmax-free transformer with linear complexity. Advances in Neural Information Processing Systems, 34:21297–21309.
- Masato Neishi and Naoki Yoshinaga. 2019. On the relation between position information and sentence length in neural machine translation. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 328–338.
- Self-attention with relative position representations. In Proceedings of NAACL-HLT, pages 464–468.
- Vighnesh Shiv and Chris Quirk. 2019. Novel positional encodings to enable tree-based transformers. Advances in neural information processing systems, 32.
- Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, page 127063.
- Attention is all you need. Advances in neural information processing systems, 30.
- Fast transformers with clustered attention. Advances in Neural Information Processing Systems, 33:21665–21674.
- Encoding word order in complex embeddings. In ICLR 2020-Proceedings of Eighth International Conference on Learning Representations.
- DA-Transformer: Distance-aware transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2059–2068.
- Tener: adapting transformer encoder for named entity recognition. arXiv preprint arXiv:1911.04474.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.