Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sphere2Vec: Spherical Positional Encoding

Updated 18 February 2026
  • Sphere2Vec is a family of multi-scale spherical positional encoders that represent geospatial points as high-dimensional vectors while preserving great-circle distances.
  • It leverages a Double Fourier Sphere framework with sinusoidal basis functions and scales to capture both global and local spatial context without projection distortions.
  • Empirical evaluations show that Sphere2Vec achieves improved accuracy in synthetic and real-world datasets, particularly in challenging polar and sparse regions.

Sphere2Vec is a family of multi-scale positional encoders designed to represent point locations on a spherical surface as high-dimensional vectors while explicitly preserving great-circle (geodesic) distances. Developed to address the shortcomings of planar or Euclidean-based encodings when applied to global-scale geospatial prediction and classification, Sphere2Vec constructs location embeddings that avoid map projection distortion and maintain invariants important for geo-aware machine learning. This methodology is grounded in the Double Fourier Sphere (DFS) framework, enabling injective, analytically tractable, and scalable encodings for downstream neural models in tasks involving species distribution, remote sensing, and more (Mai et al., 2022, Mai et al., 2023).

1. Motivation: Limitations of Planar and Euclidean Encodings

Geospatial coordinates, naturally distributed over the Earth's surface, reside on a sphere rather than a plane or in R3\mathbb{R}^3. Traditional methods, including Space2Vec, grid-cell encoders, or NeRF-style positional encodings, interpret latitude ϕ\phi and longitude λ\lambda as planar (x,y)(x, y) or embed them into R3\mathbb{R}^3. These approaches inherently introduce two principal sources of distortion:

  • Map projection distortion: Any planar mapping of the sphere misrepresents true surface (great-circle) distances, particularly at high latitudes, leading to discrepancies between encoded metric and geodesic reality. As a result, location-aware models can yield biased predictions in polar and data-sparse regions.
  • Spherical-to-Euclidean error: Embedding points as (cosϕcosλ,cosϕsinλ,sinϕ)(\cos\phi\cos\lambda, \cos\phi\sin\lambda, \sin\phi) in R3\mathbb{R}^3 preserves inner products only at scale-0 (unit sphere), but multi-scale Fourier expansions on axes fail to mimic the spherical law of cosines essential for distance preservation at arbitrary scales (Mai et al., 2023).

Sphere2Vec responds by supplying positional encodings PE:S2Rd\mathrm{PE}: \mathbb{S}^2 \rightarrow \mathbb{R}^d with analytic guarantees of spherical distance preservation and injectivity, regardless of position or density on the globe (Mai et al., 2022).

2. Mathematical Foundations and Encoding Schemes

For two points p=(ϕ1,λ1)p = (\phi_1, \lambda_1), q=(ϕ2,λ2)q = (\phi_2, \lambda_2) on a sphere of radius RR, the geodesic (great-circle) distance is

ds(p,q)=Rarccos[sinϕ1sinϕ2+cosϕ1cosϕ2cos(λ1λ2)].d_s(p, q) = R \arccos\left[ \sin\phi_1\sin\phi_2 + \cos\phi_1\cos\phi_2\cos(\lambda_1 - \lambda_2)\right].

Sphere2Vec encodes locations via multi-scale sinusoidal basis functions parameterized by frequency bands ωs\omega_s, where each scale ss enables the capture of both global and local spatial context. The family of encoders includes:

  • sphereC (Canonical): At each scale ss, compute

[sin(ωsϕ), cos(ωsϕ)cos(ωsλ), cos(ωsϕ)sin(ωsλ)][\sin(\omega_s\phi),\ \cos(\omega_s\phi)\cos(\omega_s\lambda),\ \cos(\omega_s\phi)\sin(\omega_s\lambda)]

and concatenate across SS scales for a $3S$-dimensional vector.

  • sphereM: Adds explicit high-frequency mixing terms for additional expressivity.
  • sphereC+ and sphereM+: Concatenate the corresponding spherical encoding with classical 2D “grid” (planar) terms: [sin(ωsϕ),cos(ωsϕ),sin(ωsλ),cos(ωsλ)][\sin(\omega_s\phi), \cos(\omega_s\phi), \sin(\omega_s\lambda), \cos(\omega_s\lambda)] for each scale.
  • sphereDFS: A high-dimensional embedding that includes all two-way products of frequency bands up to S1S-1 in both latitude and longitude, mirroring the complete DFS basis from pseudospectral analysis. The dimension is 4S2+4S4S^2 + 4S (Mai et al., 2022, Mai et al., 2023).

The scaling parameters are typically geometrically spaced between a minimum and maximum frequency:

ωs=min(maxmin)s/(S1),s=0,,S1.\omega_s = \min \cdot \left(\frac{\max}{\min}\right)^{s/(S-1)}, \quad s = 0,\ldots,S-1.

3. Theoretical Guarantees: Distance Preservation and Injectivity

Sphere2Vec encodings are constructed so that, at the coarsest scale (S=1S=1), the three-dimensional embedding produces inner products that are strictly monotonic in the true spherical distance:

PE1(p),PE1(q)=sinϕ1sinϕ2+cosϕ1cosϕ2cos(λ1λ2)=cos(ds(p,q)R),\langle \mathrm{PE}_1(p), \mathrm{PE}_1(q) \rangle = \sin\phi_1\sin\phi_2 + \cos\phi_1\cos\phi_2\cos(\lambda_1 - \lambda_2) = \cos\left(\frac{d_s(p, q)}{R}\right),

which ensures that

PE1(p)PE1(q)2=2sin(ds(p,q)2R)\left\|\mathrm{PE}_1(p) - \mathrm{PE}_1(q)\right\|_2 = 2 \sin\left(\frac{d_s(p, q)}{2R}\right)

and for small dsd_s, this is approximately linear in ds/Rd_s/R. Multi-scale concatenations preserve injectivity: no two distinct points map to the same vector. Full proofs are provided in (Mai et al., 2022, Mai et al., 2023); these results do not hold for 2D “wrap” or planar encoders, nor for grid-cell-based encodings.

4. Implementation and Computational Aspects

Sphere2Vec encodings are non-learned—each basis expansion is analytic with O(d)O(d) cost per point, where dd is the embedding dimension. Typical configurations are:

  • S=32S=32 for sphereC, sphereM, etc. (d=96d=96, $160$, $192$, $256$)
  • S=8S=8 for sphereDFS (d=4S2+4S=256d=4S^2+4S=256)

The embedding process consists of:

  1. Converting ϕ,λ\phi, \lambda to radians.
  2. Computing ωs\omega_s for s[0,S1]s\in[0, S-1].
  3. Generating the basis function terms (as per the encoding choice).
  4. Concatenating across scales.

Location embeddings are then integrated into downstream models, typically as input to an FFN or as an auxiliary “geo-prior” branch. No normalization is needed beyond the bounded range of trigonometric functions.

5. Empirical Performance and Evaluation

Synthetic Benchmarks

On synthetic datasets defined by mixtures of von Mises–Fisher distributions sampled across the sphere (20 datasets with varying class and concentration structure), Sphere2Vec variants (notably sphereM+, sphereC+) outperform grid, planar wrap, xyz (Euclidean R3\mathbb{R}^3), NeRF-style, and RBF/random Fourier features across all metrics. In particular, maximal absolute Top-1 accuracy gains are observed up to 2%, with error rate reductions up to 30.8%. Gains are especially pronounced for class-conditional distributions concentrated at high latitudes or in sparse regions (Mai et al., 2023).

Real-World Geospatial Prediction and Classification

Sphere2Vec has been evaluated on seven global-scale datasets, including fine-grained species recognition (BirdSnap, NABirds, iNat2017/2018), remote sensing (fMoW), and Flickr YFCC-GEO100. Summary results:

  • All Sphere2Vec variants systematically outperform 2D/3D baselines in location-only and image+location settings, with mean Reciprocal Rank (MRR) and Top-1 accuracy gains of 0.3–1.0% and robust improvements in polar/sparse regimes.
  • Qualitative analyses (e.g., Arctic fox predicted distributions) confirm that Sphere2Vec produces sharply localized, geodesically accurate probability maps, compared to spread or overgeneralized baselines.
  • Latitude-band stratification demonstrates that performance superiority is magnified in bands above 8080^\circ latitude or below 80-80^\circ (Mai et al., 2022, Mai et al., 2023).

6. Limitations and Directions for Further Research

Sphere2Vec encodes points on a perfect sphere; as such, it does not directly account for ellipsoidal shape (WGS84) or altitude. The selection of scale parameters (min,max,S)(\min, \max, S) affects locality and receptive field, but defaults are robust. The O(S2S^2) complexity of the full DFS basis may present memory/performance trade-offs for very high resolutions.

Potential improvements include:

  • Learning frequency bands ωs\omega_s end-to-end.
  • Extending encoders to ellipsoidal coordinates or incorporating vertical dimension.
  • Integrating with graph-based geographic kernels or positional encodings for global Earth system models.
  • Adapting the framework for sequence and trajectory modeling on the sphere, and for spatially structured transformers (Mai et al., 2023).

7. Broader Impact and Applicability

Sphere2Vec represents a principled solution to a longstanding limitation in geospatial AI: the reliable, distortion-free embedding of locations for machine learning on global-scale data. Its analytic construction ensures the preservation of key geometric invariants, leading to measurable gains in diverse applications, especially where spatial sparsity or polar distribution is significant. A plausible implication is that Sphere2Vec, by providing a universal geospatial positional encoding, stands to become an essential component in future foundation models for biodiversity prediction, climate modeling, and beyond (Mai et al., 2023).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sphere2Vec.