Biological implementation of global Softmax-like attention

Determine whether biological cortical circuits can compute a global, instantaneous Softmax normalization across an entire sequence, as required by transformer attention mechanisms, and, if so, identify the specific neural mechanisms that implement this computation.

Background

Within transformer architectures, attention relies on a Softmax normalization that enforces competition among elements so that attention weights sum to one. The authors map transformer components onto cortical microcircuitry and argue that Values, Queries, and Keys could be instantiated by distinct laminar pathways and dendritic computations.

However, realizing attention fully also requires a Softmax-like competitive normalization. The authors explicitly note that a global, instantaneous Softmax across an entire sequence has no established biological implementation, even though they suggest local lateral inhibition and divisive normalization as a plausible approximation.

References

Biologically, the global, instantaneous calculation of a Softmax function across an entire sequence is unclear.

The Neuroscience of Transformers  (2603.15339 - Koenig et al., 16 Mar 2026) in Section 4: Learning and plasticity