Rethinking Positional Encoding (2107.02561v3)

Published 6 Jul 2021 in cs.LG and cs.CV

Abstract: It is well noted that coordinate based MLPs benefit -- in terms of preserving high-frequency information -- through the encoding of coordinate positions as an array of Fourier features. Hitherto, the rationale for the effectiveness of these positional encodings has been solely studied through a Fourier lens. In this paper, we strive to broaden this understanding by showing that alternative non-Fourier embedding functions can indeed be used for positional encoding. Moreover, we show that their performance is entirely determined by a trade-off between the stable rank of the embedded matrix and the distance preservation between embedded coordinates. We further establish that the now ubiquitous Fourier feature mapping of position is a special case that fulfills these conditions. Consequently, we present a more general theory to analyze positional encoding in terms of shifted basis functions. To this end, we develop the necessary theoretical formulae and empirically verify that our theoretical claims hold in practice. Codes available at https://github.com/osiriszjq/Rethinking-positional-encoding.

Authors (3)

Jianqiao Zheng (8 papers)
Sameera Ramasinghe (36 papers)
Simon Lucey (107 papers)

Citations (46)

View on Semantic Scholar

Summary

Reevaluating Positional Encoding: A Comprehensive Framework and Analysis

Introduction

The paper "Rethinking Positional Encoding" by Jianqiao Zheng, Sameera Ramasinghe, and Simon Lucey provides an extensive treatment of positional encoding in the context of coordinate-based Multi-Layer Perceptrons (MLPs). Positional encoding, a prevalent mechanism for embedding coordinate positions into a suitable representation for neural networks, is traditionally reliant on Fourier feature mappings. This work broadens the scope by introducing and analyzing non-Fourier embedding functions, presenting a generalized framework to evaluate positional encodings through the lens of shifted basis functions.

Core Contributions

The authors make several substantive contributions:

Generalized Positional Encoding: The paper introduces a transformative perspective, showing that positional encoding can be constructed as a sampling mechanism from shifted continuous basis functions rather than being confined to Fourier features. This formulation extends the applicability and interpretability of positional encoding mechanisms.
Stable Rank and Distance Preservation: A pivotal insight is that the efficacy of an embedding function relies on a balance between the stable rank of the embedded matrix and the preservation of distances among embedded coordinates. This relationship underpins both the capacity for memorization (rank) and generalization (distance preservation).
Empirical and Theoretical Analysis: The authors develop the required theoretical foundations and substantiate their claims using empirical evaluations. They showcase that various continuous functions, including Gaussian functions, can achieve comparable performance to Fourier mappings while being more stable and less dependent on specific frequency selection.
Practical Implementations and Efficient Embeddings: Utilizing Gaussian signals as the embedding function, the paper demonstrates similar or better performance compared to Fourier features, with reduced volatility and more efficient embedding dimensions.

Implications and Potential Applications

The insights provided by this paper have significant implications for both theory and practice:

Theoretical Insight: By revealing that the stable rank and distance preservation dictate performance, this work pushes forward our understanding of how embedding functions operate within neural networks. This shaped a converging dialogue on intrinsic properties of the encodings over specific constructions.
Practical Applications: The proposed framework potentially broadens the arsenal of tools available for researchers and engineers, allowing for the customization of embeddings based on task-specific needs and constraints. For instance, Gaussian embeddings offer stable and efficient alternatives, particularly in applications involving low-dimensional data or environments sensitive to embedding volatility, such as real-time systems.

Future Directions

Exploration into non-Fourier embeddings is expected to inspire follow-on research aimed at developing even more specialized encoding strategies. Potential future endeavors could include:

Algorithmic Optimization: Developing algorithms that dynamically adjust embeddings in response to incoming data or evolving patterns, optimizing both rank utilization and distance preservation in real-time.
Application-Specific Customization: Custom tailoring of embedding functions for specific application domains such as robotics, where spatial and temporal dependencies must be finely tuned.
Enhanced Theoretical Models: Further development and refinement of theoretical models to predict embedding behavior across different network architectures and data topologies.

Conclusion

In summary, this work effectively challenges the hegemony of Fourier features in positional encoding by providing a richer, more versatile framework for embedding construction. Through robust theoretical and empirical assessments, this paper sets the stage for a new era in the development and application of positional encodings in various domains within artificial intelligence and beyond. This exploration into alternative embedding functions opens new avenues for both foundational research and practical deployments, ultimately bringing enhanced performance capabilities to a diverse range of machine learning tasks.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - osiriszjq/Rethinking-positional-encoding (89 stars)