Optimal implementation of the Context Map aggregation function g_s

Ascertain the optimal learnable implementation for the permutation-invariant aggregation function g_s within the Context Map Φ_s that maps a set of context token embeddings C(s) ⊂ (ℝ^n)^{2k} to a nonzero direction vector v_context whose projectivization [v_context] lies on the exceptional divisor E_s ≅ ℙ^{n−1}_ℝ; specifically determine whether attention, graph neural networks, or multilayer perceptrons provide superior empirical performance for this aggregation task.

Background

The paper proposes resolving representational singularities in LLM token spaces via a scheme-theoretic blow-up, replacing a singular token with its exceptional divisor E_s ≅ ℙ^{n−1}_ℝ. To use this construction operationally, the authors introduce a Context Map Φ_s that selects a point on E_s based on surrounding context. They decompose Φ_s into a learnable, permutation-invariant aggregation function g_s that produces a direction vector from context embeddings, followed by a fixed projection to projective space.

While they outline candidate realizations for g_s (e.g., attention, graph neural networks, multilayer perceptrons), the authors explicitly state that which architecture is best remains unresolved and requires empirical investigation. This choice is pivotal because g_s governs how context is summarized to select meanings on the exceptional divisor, directly affecting the robustness and interpretability of desingularized representations.

References

The specific instantiation of the learnable aggregation function $g_s$ within the $\Phi_s$ module, however, remains a question for future empirical investigation. Whether the optimal implementation is an attention mechanism, a graph neural network, or a simple multi-layer perceptron is beyond the scope of this purely theoretical work.

— TokenBlowUp: Resolving Representational Singularities in LLM Token Spaces via Monoidal Transformations (2507.19747 - Zhao, 26 Jul 2025) in Section: Architectural Implications of Geometric Desingularization

Optimal implementation of the Context Map aggregation function g_s

Background

References

Related Problems