Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models (2405.04585v1)

Published 29 Apr 2024 in cs.CL, cs.AI, and cs.LG

Abstract: There are several improvements proposed over the baseline Absolute Positional Encoding (APE) method used in original transformer. In this study, we aim to investigate the implications of inadequately representing positional encoding in higher dimensions on crucial aspects of the attention mechanism, the model's capacity to learn relative positional information, and the convergence of models, all stemming from the choice of sinusoidal basis functions. Through a combination of theoretical insights and empirical analyses, we elucidate how these challenges extend beyond APEs and may adversely affect the performance of Relative Positional Encoding (RPE) methods, such as Rotatory Positional Encoding (RoPE). Subsequently, we introduce an innovative solution termed Orthogonal Polynomial Based Positional Encoding (PoPE) to address some of the limitations associated with existing methods. The PoPE method encodes positional information by leveraging Orthogonal Legendre polynomials. Legendre polynomials as basis functions offers several desirable properties for positional encoding, including improved correlation structure, non-periodicity, orthogonality, and distinct functional forms among polynomials of varying orders. Our experimental findings demonstrate that transformer models incorporating PoPE outperform baseline transformer models on the $Multi30k$ English-to-German translation task, thus establishing a new performance benchmark. Furthermore, PoPE-based transformers exhibit significantly accelerated convergence rates. Additionally, we will present novel theoretical perspectives on position encoding based on the superior performance of PoPE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. A simple and effective positional encoding for transformers. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2974–2988, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.236. URL https://aclanthology.org/2021.emnlp-main.236.
  2. George Dassios. Ellipsoidal Harmonics: Theory and Applications. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2012.
  3. Multi30k: Multilingual english-german image descriptions. CoRR, abs/1605.00459, 2016. URL http://arxiv.org/abs/1605.00459.
  4. J. Favard. Sur les polynomes de Tchebicheff. C. R. Acad. Sci., Paris, 200:2052–2053, 1935. ISSN 0001-4036.
  5. Deberta: Decoding-enhanced BERT with disentangled attention. CoRR, abs/2006.03654, 2020. URL https://arxiv.org/abs/2006.03654.
  6. The impact of positional encoding on length generalization in transformers. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 24892–24928. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/4e85362c02172c0c6567ce593122d31c-Paper-Conference.pdf.
  7. Rethinking positional encoding in language pre-training. CoRR, abs/2006.15595, 2020. URL https://arxiv.org/abs/2006.15595.
  8. Dynamic context-guided capsule network for multimodal machine translation. In Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, page 1320–1329, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450379885. doi: 10.1145/3394171.3413715. URL https://doi.org/10.1145/3394171.3413715.
  9. Distill the image to nowhere: Inversion knowledge distillation for multimodal machine translation. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2379–2390, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.152. URL https://aclanthology.org/2022.emnlp-main.152.
  10. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
  11. Ernie-unix2: A unified cross-lingual cross-modal framework for understanding and generation, 2022.
  12. Self-attention with relative position representations, 2018.
  13. Roformer: Enhanced transformer with rotary position embedding. CoRR, abs/2104.09864, 2021. URL https://arxiv.org/abs/2104.09864.
  14. Multimodal machine translation through visuals and speech. Machine Translation, 34, 09 2020. doi: 10.1007/s10590-020-09250-0.
  15. Walter Van Assche. Orthogonal polynomials, toda lattices and painlevé equations. Physica D: Nonlinear Phenomena, 434:133214, 2022. ISSN 0167-2789. doi: https://doi.org/10.1016/j.physd.2022.133214. URL https://www.sciencedirect.com/science/article/pii/S0167278922000343.
  16. Attention is all you need. CoRR, abs/1706.03762, 2017. URL http://arxiv.org/abs/1706.03762.
  17. Multimodal transformer for multimodal machine translation. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4346–4350, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.400. URL https://aclanthology.org/2020.acl-main.400.
  18. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78, 2014. doi: 10.1162/tacl˙a˙00166. URL https://aclanthology.org/Q14-1006.
  19. Neural machine translation with universal visual representation. In International Conference on Learning Representations, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Arpit Aggarwal (2 papers)