Timed Graph Relationformer (TGR) Layer
- Timed Graph Relationformer (TGR) Layer is a neural architecture that processes time-indexed, feature-annotated graphs by integrating local topological context with global set-level and relational information.
- It combines multi-head graph attention, DeepSets, and Relation Net outputs via a learned gating mechanism and incorporates Time2Vec temporal encoding to produce permutation-invariant representations.
- The TGR layer has been effectively applied to reinforcement learning scenarios like interactive swarm leader identification, demonstrating superior robustness and generalization over baseline GNN approaches.
The Timed Graph Relationformer (TGR) layer is a neural architecture for processing time-indexed, feature-annotated graphs. It is designed to generate informative, permutation-invariant global representations suitable for reinforcement learning with graph-structured observations. The TGR layer was introduced in the context of interactive Swarm Leader Identification (iSLI), where an agent must probe a robotic swarm to infer its leader, but its construction and data flow highlight a general approach to temporal graph representation learning (Bachoumas et al., 20 Dec 2025).
1. Data Flow and Architectural Modules
At each discrete time step , the TGR layer processes an observation encoded as a directed graph , where is the node feature matrix (for swarm agents plus the prober), and are adjacency masks, and is the current timestep.
The TGR layer consists of the following modules, applied with a specific data flow:
- Multi-Head Graph Attention Transformer (GAT): Processes node features and adjacency information to produce updated node embeddings that integrate local topological context and edge weighting, outputting .
- DeepSets (DS) Readout: Computes a permutation-invariant, set-level summary of node features by aggregating transformed node embeddings.
- Relation Net (RN) Readout: Aggregates all pairwise node interactions, incorporating both node features and edge attributes for a relational summary.
- Gating Fusion: Combines DS and RN outputs via an element-wise, learned gating mechanism.
- Time2Vec (T2V) Temporal Encoding: Encodes the absolute timestep as a high-dimensional periodic/linear feature.
The outputs of the Gating Fusion and T2V components are concatenated to produce the final TGR global representation .
2. Forward Pass and Mathematical Formulation
The TGR layer's forward pass is precisely specified by the following sequence of operations:
- Graph Attention Transformer (GAT):
For each node and head, query, key, and value projections are computed, with attention coefficients determined by masked and leaky-ReLUed softmax activations incorporating edge weights; outputs across heads are concatenated.
- DeepSets Global Read-Out:
where and are MLPs.
- Relation Net Global Read-Out:
where the edge feature captures information such as interaction counts.
- Learned Gating Fusion:
with the elementwise sigmoid.
- Time2Vec Temporal Encoding:
- Final Output:
yielding a vector in .
3. Gating Mechanism for Relational Fusion
The distinctive aspect of the TGR architecture is its gating fusion, which allows dynamic modulation between coarse set-level information (DS) and fine relational cues (RN) at each timestep. Each coordinate of the DS output is multiplied by a learned sigmoid gate . This enables the RN to selectively amplify or suppress set-based features in response to relational context, such as the concentration of prober-swarm interactions. The gating mechanism is critical for integrating aggregate and relational information adaptively as the probing policy interacts with the swarm (Bachoumas et al., 20 Dec 2025).
4. Integration with Downstream Sequence Modeling and PPO
The output sequence is linearly projected and provided as the input token sequence to an S5 encoder, a structured state-space model. The S5 applies layer normalization, structured state-space updates, and residual connections internally. Its recurrent hidden state summarizes past TGR-derived tokens. Two MLP heads—a policy (actor) and value (critic)—map the S5 encoding to the categorical policy over base velocities and value estimates.
Gradients from the PPO objective—including policy loss, value loss, and entropy bonus—flow through the actor and critic heads, S5, and into the TGR. All components, including GAT, DS, RN, T2V, are trained end-to-end to maximize expected clipped surrogate advantage (Bachoumas et al., 20 Dec 2025).
5. Implementation Details and Hyperparameters
The TGR layer's implementation was found to be robust across a range of graph sizes and swarm speeds. The following hyperparameter settings were used to reproduce results:
| Module | Specification | Key Parameters |
|---|---|---|
| GAT | Multi-head, edge-weighted | heads, per head |
| DS (MLPs) | Coarse aggregation | 2 hidden layers, 256 units each |
| RN (MLPs) | Pairwise relational reasoning | 2 hidden layers, 256 units each |
| T2V | Temporal encoding | (1 linear + 63 sinusoid) |
| Output dim | Global, permutation-invariant | , |
| S5 encoder | State-space sequence | 4 layers, 256 hidden units |
| PPO | RL optimization | clip=0.2, entropy=0.01, lr=3e-4, batch=64, GAE , |
Node- and edge-level features are supplied as raw input. Non-specified MLPs use Xavier initialization and LeakyReLU (slope 0.2). Simulation operates at 20 Hz; on-robot at 5 Hz.
6. Application to Swarm Leader Identification
The TGR layer serves as the core graph representation mechanism in the iSLI problem, enabling the learning of adversarially probing policies for leader detection under partially observable and dynamic conditions. It outperforms baseline GNN approaches by successfully fusing topological, interactional, and temporal structure, generalizing across swarm sizes and dynamics, and supporting robust sim-to-real transfer. The architecture is particularly well-suited for reinforcement learning settings where relational and set-aggregate information must be adaptively balanced to support sequential decision making (Bachoumas et al., 20 Dec 2025).