Extending benefits of randomized positional encodings to other tasks
Investigate whether stochastic (randomized) positional encoding schemes confer advantages such as improved length generalization and trainability for transformer models on tasks beyond the q-sparse token selection task, and characterize the scope and limitations of these benefits across different problem settings.
References
There are still many open questions. Can we extend the benefits of randomized positional encodings to other tasks?
— Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
(2406.06893 - Wang et al., 11 Jun 2024) in Conclusion