SEAGAN: domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes

Published 17 Jun 2026 in cs.LG | (2606.19623v1)

Abstract: Graph neural networks (GNNs) provide a flexible framework for learning from scientific data linked through physical, biological, or functional relationships. One promising domain is plant physiology, where measured responses often arise from multiple interacting processes whose exact separation remains difficult even with manual intervention. In plant physiology, a key example is the A-Ci curve, which relates net CO2 assimilation rate (Anet) to leaf intercellular CO2 concentration (Ci) and is used to estimate photosynthetic parameters in leaf and crop-canopy models. However, reliable estimation requires identifying the active biochemical limitation state at each curve point, which remains a major source of uncertainty. Here, we formulate limitation-state identification along A-Ci curves as a graph-based node classification problem, with curve points as nodes. Domain-specific graph representations are created using distance-based k-nearest-neighbor (kNN) and auxiliary-signal-guided (ASG) connectivity, with edge attributes encoding pairwise relations. The framework was evaluated against conventional learning baselines, graph-based architectures, and an automated fitting-based benchmark. Results on a large synthetic dataset with known ground-truth limitation states show that graph-based models improve classification, particularly near biochemical transition regions. The best-performing configuration, SEAGAN (domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes), integrates process-aware node features, edge attributes, kNN connectivity, and graph attention with weighted cross-entropy loss, achieving an F1-score of 0.857 and an accuracy of 0.882. The results show that representing A-Ci curves as graphs improves biochemical limitation-state analysis, with edge-aware attention over local kNN neighborhoods providing the most effective strategy.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces SEAGAN, a domain-specific GNN that leverages edge-aware attention to enhance node classification on photosynthetic response curves.
It employs both kNN and auxiliary-signal-guided graph strategies to robustly partition ambiguous biochemical regimes using domain-infused signals.
Quantitative results show that the GAT-kNN variant achieved a macro-F1 of 0.857 and an accuracy of 0.882, outperforming traditional and feature-based baselines.

SEAGAN: Domain-Specific, Edge-Aware Graph Attention for Node Classification in Dynamic Plant Physiology

Introduction

This work introduces SEAGAN (domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes), a novel GNN architecture tailored for node-wise classification of photosynthetic biochemical limitation states along C3 A–Ci response curves. The identification of these limitation regimes—Rubisco-limited ( $A_c$ ), RuBP-regeneration-limited ( $A_j$ ), and TPU-limited ( $A_p$ )—represents the central challenge in inferring photosynthetic parameters from gas exchange data. Traditional heuristic or automated fitting methods often yield unreliable or ambiguous assignments, particularly near biochemical transition zones, creating instability in downstream mechanistic parameter estimation. This study reconceptualizes curve analysis as a relational learning problem, leveraging structured graph representations, domain-specific auxiliary signals, and edge-aware attention mechanisms.

Domain-Specific Graph Construction Strategies

The paper formulates each A–Ci curve as a small graph where nodes correspond to discrete measurement points, and edges encode pairwise physiological or geometric relationships. Two graph construction frameworks are compared:

Distance-based kNN graphs: Nodes connect to their four nearest neighbors in the $C_i$ – $A_\text{net}$ space.
Auxiliary-Signal-Guided (ASG) graphs: Nodes are partitioned into functional groups based on peak detection in carefully derived auxiliary diagnostic signals, and connections are fully assigned within groups with boundary links across transitions.

Auxiliary signals ( $s_{Ac}$ and $s_{Aj}$ ), derived by normalizing measured $A_\text{net}$ with dimensionless process-specific response terms, provide increased local diagnostic resolution and structurally inform both node features and edge attributes.

Figure 2: The auxiliary signals $s_{Ac}$ and $s_{Aj}$ stratify local curve behavior, enabling refined grouping into limitation regimes.

Figure 1: Comparison of kNN (proximity-based) and ASG (biochemically-informed) graph connectivities for A–Ci curves.

Model Architectures and Training Framework

Three principal GNN variants are evaluated—GCN, GAT (graph attention network with edge attributes), and Graph U-Net (with hierarchical pooling/unpooling and edge-aware variants). Each model ingests node features (including both directly observed and auxiliary signals) and adopts either kNN or ASG connectivity, with edge attributes based on pairwise differences in auxiliary signals. Node-wise multi-class classification is learned via cross-entropy (with balanced, weighted, and focal adaptations to address label imbalance).

The workflow involves the automated generation of 10,000 synthetic A–Ci curves spanning the physiological range (parameterized via LHS), realistic noise injection, and ground-truth regime annotation (enabling quantification of classification fidelity). Feature-based ML and neural baselines (RF, SVM, XGBoost, FF-NN) serve as controls.

Figure 3: End-to-end workflow: synthetic data generation, graph construction, model development, and evaluation.

Figure 4: Model families—GCN baseline, GAT with edge-aware attention, and Graph U-Net—used for limitation-state node classification.

Quantitative Evaluation and Comparative Results

Feature-based models failed to accurately resolve node-wise regime assignments, especially in transition regions (macro-F1 $A_j$ 0). The introduction of GCNs with local message passing provided a statistically significant boost (macro-F1 $A_j$ 1), elucidating the benefit of relational context. However, GCNs also manifested deficiencies in capturing ambiguous nodes near regime boundaries, limiting discrimination power.

SEAGAN—implemented as GAT-kNN with edge attributes and trained under weighted cross-entropy—produced the strongest results, with macro-F1 $A_j$ 2, recall $A_j$ 3, precision $A_j$ 4 and accuracy $A_j$ 5. All GAT variants with learned edge-aware attention substantially outperformed both the GCN baseline and an automated fitting-based reference (PhoTorch, F1 $A_j$ 6), with statistically significant gains confirmed across 30 repeated trials.

Figure 5: F1-score distributions across feature-based and GCN baselines demonstrate clear performance separation.

The edge attribution and ablation analyses (using GNNExplainer) demonstrated that adaptive attention focuses on relevant transition-spanning neighbors, enabling robust disambiguation that is not available to convolution-based or group-partitioned (ASG) models.

Theoretical and Practical Implications

The critical advancement is the empirical demonstration that representing physiological response curves as explicit graphs—with process-specific node features, edge attributes, and learnable local attention—yielded marked improvements in node-wise state segmentation over both fixed feature-aggregation and purely group-partitioned ASG graph strategies. The results confirm that, for small relational systems with ambiguous transition zones and limited data, flexible local aggregation (as implemented in kNN edge-aware GATs) outperforms pre-imposed group clustering (ASG) and hierarchical pooling (Graph U-Net).

Practically, this enables more reliable, scalable, and automated biochemical regime assignment for photosynthetic parameter estimation, directly supporting high-throughput plant functional phenomics and reducing the dependence on subjective manual curation or brittle fitting heuristics.

Implications for Relational Machine Learning

From the AI perspective, SEAGAN demonstrates that domain-infused auxiliary signals, local geometric connectivities, and edge-conditioned attention yield architectures robust to noisy and ambiguous scientific measurements. This framework serves as a model for relational learning problems involving small, structured, and highly contextual scientific datasets, bridging the gap between physical process models and deep GNNs. The results support further research into adaptive graph construction, process-informed edge attributes, and uncertainty quantification for parameter estimation.

Conclusion

SEAGAN establishes a new high-water mark for node-wise photosynthetic limitation-state classification in A–Ci curves by integrating domain-specific auxiliary features, edge-aware attention, and judicious local connectivities. kNN-based GATs outperformed both feature-based and convolutional graph baselines, as well as classical and contemporary fitting algorithms, marking a significant advance in automated, scalable plant phenotyping. The architecture and methodology provide a robust template for future work in graph-based biochemical parameter inference, uncertainty propagation, and broader scientific-relational learning. Future extensions should integrate direct parameter estimation and expand to empirical datasets with more extreme measurement regimes.

Markdown Report Issue