2000 character limit reached
Improving Transformers using Faithful Positional Encoding (2405.09061v2)
Published 15 May 2024 in cs.LG
Abstract: We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing information about the positional order of the input sequence. We show that the new encoding approach systematically improves the prediction performance in the time-series classification task.
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer-Verlag.
- Ersoy, O. (1985). Real discrete fourier transform. IEEE transactions on acoustics, speech, and signal processing, 33(4):880–882.
- Rethinking positional encoding in language pre-training. In International Conference on Learning Representations.
- Diagnostic spatio-temporal transformer with faithful encoding. Knowledge-Based Systems, 274:110639.
- Pre-training context and time aware location embeddings from spatial-temporal trajectories for user next location prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 4241–4248.
- Formal algorithms for transformers. arXiv e-prints, pages arXiv–2207.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
- Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 464–468.
- Multi-time attention networks for irregularly sampled time series. In International Conference on Learning Representations.
- Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.