- The paper establishes that graph transformers and GPS-networks correspond to modal logics (GML+G and PL+G) in real-valued settings.
- It shows that using floating-point arithmetic enables absolute global counting (GML+GC and PL+GC), contrasting with relative ratio-based counting in real models.
- The work provides a rigorous framework linking neural model architectures to logical invariants, highlighting practical design and benchmarking implications in graph learning.
Overview
The paper "Expressive Power of Graph Transformers via Logic" (2508.01067) presents a detailed analysis of the capabilities of graph transformers (GTs) and hybrid GPS-networks by precisely characterizing their expressive power through logical frameworks. Two primary configurations are considered: models employing real-number representations and those using finite-precision floats. The work delivers a uniform treatment for both soft-attention and average hard-attention mechanisms, eschewing positional/structural encodings to isolate and analyze the core model architectures. Logical expressiveness is studied both absolutely (relative to logical characterization sans background logic) and in restriction to FO-definable vertex properties.
Background and Motivation
Classical GNNs possess an expressivity limitation largely attributable to their local message-passing paradigm; they fail in distinguishing certain graph properties, notably those relying on global information or long-range interactions. Transformers, through attention, have proven to mitigate some of these drawbacks, inspiring models such as GTs and GPS-networks that mix attention with message passing. However, the literature lacks a precise logical analysis paralleling that for message-passing GNNs (e.g., correspondence to various modal logics).
This work addresses: (i) the logical boundary of expressive power for GTs and GPSs with/without message passing; (ii) how numerical representations (reals vs. floats) fundamentally impact expressivity; and (iii) the relationship between attention types and logical modalities (counting/non-counting, local/global).
The formulations in the paper are highly rigorous:
- Graph transformers (GTs): "Bag-of-vertices" transformers, ignoring explicit edge structure, realized as vertex classifiers.
- GPS-networks: Generalized hybrids incorporating both local message passing and global attention (as in [Rampásek et al., 2022]).
- Attention variants: Soft-attention (softmax) and average hard-attention.
- Naked architecture: No positional encoding, as the focus is solely on model-intrinsic properties.
The analysis is conducted in both the real-number and float settings, with careful formalization of the float case per IEEE 754 rules, including associativity and numerical underflow phenomena, ensuring isomorphism invariance.
Logical Characterization of Expressive Power
Real-Valued Models
For models whose vector operations are realized over R:
- GPS-networks (including any mix of message passing and global attention) are, relative to FO, exactly as expressive as graded modal logic with the non-counting global modality (GML+G). That is, for FO-definable vertex properties, these models capture precisely those that are GML+G-definable.
- GTs (i.e., transformers without message passing or edge structure) are, relative to FO, as expressive as propositional logic with non-counting global modality (PL+G).
A crucial outcome is that even with global attention, real-based models cannot perform unrestricted global counting. Instead, they capture relative but not absolute counting: e.g., they can check if the number of p-labeled vertices exceeds the number of q-labeled vertices but cannot state "there are at least k p-labeled vertices" in a uniform way. This demarcation is formalized via the introduction of a new bisimulation (∼G%) and a van Benthem/Rosen-style theorem: an FO property invariant under this bisimulation is exactly a GML+G property.
As a concrete exemplar, a single-layer soft-attention GT can realize the majority property—"at least half the graph is p-labeled"—highlighting the accessibility of ratio-based properties.
Finite-Precision (Float) Models
When models implement floating-point arithmetic:
- GPS-networks [floats] (including full message passing) achieve the expressivity of graded modal logic with counting global modality (GML+GC), absolutely—that is, every vertex property definable in GML+GC is realizable, and all such model-computable properties are definable in GML+GC.
- GTs [floats] correspond absolutely to propositional logic with counting global modality (PL+GC).
This marks a non-monotonic transition: float-based models lose the ability to express relative ratios but gain in absolute global counting, strictly because of floating point underflow/saturation behavior in sum and softmax calculations. This phenomenon disrupts the capacity for relative counting in large graphs but enables "at least k" cardinality properties to abruptly change as k crosses format-specific thresholds, corresponding to the counting global modality.
Technical Approach and Proof Insights
A key advancement is the construction and application of global-ratio graded bisimilarity (∼G%). This relation is shown to be both necessary and sufficient for capturing the invariance class of FO properties realized by real-based GTs/GPSs, and is further utilized to prove the main van Benthem/Rosen-style theorem extending classical results to this hybrid logical setting.
The logical-to-model translation (for realizing a GML+G or GML+GC formula as a GT/GPS) is based on the layer-by-layer simulation of logical connectives and global modalities, exploiting the interplay between skip-connections, aggregation, and attention. Conversely, model-to-logic translation proceeds by an induction over network layers, with invariance under the respective bisimulation relations serving as the backbone of the method.
In the float case, technical care is taken to model the effect of saturating sums/softmaxes (Proposition 4.2), with results extended modularly to GNNs with global readouts.
Implications and Open Questions
Practical Implications
- Architectural choices: Researchers and practitioners must recognize that the numerical format (float vs. real) fundamentally alters the logical boundaries of transformer-based models. Claims about counting capacities or ratio-based properties need to reference back to this format distinction.
- Design of positional encodings: The foundation laid for naked models clarifies exactly what is contributed by augmenting input with positional encodings, informing the minimal requirements for downstream applications.
- Benchmarking & expressivity: Float-based architectures (the norm in real-world implementations) have theoretical access to counting properties analogous to GNNs with counting readouts, demystifying some empirical observations.
Theoretical Impacts
- Logical delineation: This work precisely places GTs and hybrids between classical GNNs and more powerful counted-modality logics, and shows that hybrids with transformer layers are strictly more powerful than GNNs for both absolute and background-FO-restricted expressive power.
- Modal logic and neural computation: The correspondence established provides a rigorous mapping from neural architectures to fragments of modal logic, supporting ongoing research in logic-based explanations and analysis of neural models.
Unresolved Problems and Future Work
- Vertex vs. Graph classification: Extension of these results to graph-level classification is left open; technical difficulties remain primarily in the translation of vertex to graph logic properties.
- Effect of positional encodings: While the paper sets a foundation, analysis of the combined expressive power with various encodings (e.g., Laplacian, centrality, homomorphism) is a prime avenue for further exploration.
- Completeness in real-valued models: It is open whether every real-valued GPS-network can be simulated by a GNN with counting global readout under certain tasks.
- Alternative float implementations: Examining the impact of different summing/rounding protocols (including hardware and software library idiosyncrasies) on expressivity is left for future theoretical and empirical study.
Conclusion
"Expressive Power of Graph Transformers via Logic" delivers a comprehensive, uniform, and technically sophisticated account of the logical boundaries of graph transformer models, grounding their distinction from, and relationship to, classical GNNs within modal logic with and without counting modalities. It establishes that the choice of numerical domain (reals or floats) and aggregation protocol is not a technical detail, but a cardinal determinant of expressivity, with profound implications for both theoretical analysis and practical model design in graph representation learning (2508.01067).