Forecasting Unobserved Node States (FUNS)

Updated 20 January 2026

FUNS is a forecasting framework for predicting node states in graphs with missing temporal data using observed trajectories and auxiliary domain knowledge.
It employs spatio-temporal models, data augmentation techniques, and learnable graph topology to enhance prediction accuracy.
Empirical studies show significant improvements and robust uncertainty quantification across sensor networks, power grids, and biological systems.

Forecasting Unobserved Node States (FUNS) refers to the inductive prediction of future node-level attributes on graphs or networked dynamical systems for which the temporal data of many nodes is withheld during training. The fundamental objective is to leverage observed node trajectories, graph structure, and auxiliary domain knowledge to generalize forecasting models to nodes whose training data is entirely unavailable. FUNS formally subsumes imputation, uncertainty quantification, and functional observer design in domains such as sensor networks, power grids, biological systems, and spatio-temporal analytics.

1. Formalization of the FUNS Problem

Given a graph $G=(V,E)$ with node set $V=\{v_1,\ldots,v_N\}$ and topology $E\subseteq V\times V$ , let the temporal feature tensor be $X\in\mathbb{R}^{N\times T\times F}$ , where $T$ is the input horizon and $F$ is feature dimension. Partition $V$ into $V_\mathrm{obs}$ (nodes with historical and future ground truth available during training) and $V_\mathrm{unobs}=V\setminus V_\mathrm{obs}$ (nodes with no future data during training).

A spatial-temporal graph neural network (STGNN) $f_\Theta$ is trained on $V_\mathrm{obs}$ to minimize

$L_\mathrm{train}(\Theta) = \sum_{v\in V_\mathrm{obs}}\sum_{t=T+1}^{T+\tau} \lVert f_\Theta(x^{1:T}_v)^{(t)} - x^{(t)}_v \rVert_2^2 + \lambda\cdot R(\Theta)$

with the ultimate goal to infer

$\hat{y}_v^{T+1:T+\tau} = f_\Theta(x^{1:T}_v)$

for $v\in V_\mathrm{unobs}$ at test time, with $f_\Theta$ never exposed to their true futures during training (Lei et al., 2024).

2. ST-FiT: Temporal Augmentation and Learnable Spatial Topology

The ST-FiT framework is a principled, model-agnostic wrapper for spatial-temporal forecasting that enhances inductive generalization to unobserved nodes through two core mechanisms (Lei et al., 2024):

Temporal Data Augmentation:

MixUp: Synthetic time series $\tilde{x}^{(1)} = \alpha x_i + (1-\alpha)x_j$ , $\alpha\sim\mathrm{Beta}(\lambda,\lambda)$ .
Fourier Shift: Swap random frequency bins between DFTs of $x_i$ and $x_j$ ; invert to create $\tilde{x}^{(2)}$ .
Auxiliary losses enforce consistency between model outputs on augmented inputs.

Spatial Graph Topology Learning:

Hidden node embeddings $z_v$ are computed.
Pairwise similarity scores $s_{uv}$ are generated and sparsified via a Gumbel-Softmax scheme to define a sparse, generalizable adjacency $A^*$ .
$A^*$ replaces the physical topology for spatial convolutions, enabling transferability.

Empirically, ST-FiT outperforms baselines (STGCN, STGODE) and matches or betters fine-tuned TransGTR on traffic networks, especially for nodes with no training histories. Ablation studies confirm the necessity of both augmentation and topology learning—removal of either degrades MAE by 5–10%, disabling topology co-learning increases MAE by ~1.5, and replacing $A^*$ by the identity graph worsens results (Lei et al., 2024).

3. Spatio-Temporal GNN Approaches and Data-Driven Induction

Generic spatio-temporal GNN architectures are frequently adapted for FUNS via random masking of observed nodes during training, shared parameterization, and static node priors injection (Roth et al., 2022). The FUNS-Network prototype leverages attention-based GNNs with Graph-GRU temporal layers, static feature embeddings $S$ , and node masking such that the model must impute and forecast at unobserved locations exclusively from spatial relations and prior knowledge.

The training protocol involves time-windowed mini-batches, dynamic node masking (input vs. held-out optimization set), and loss computation only on the unobserved subset. On large-scale road networks (LuST: $n=5779$ ), FUNS-N yields MSE improvements up to 61% versus spatial+LSTM baselines, even with sparse sensor coverage ( $\leq 30\%$ observed) (Roth et al., 2022).

4. Physics-Informed and Bayesian Estimation for Unobserved Dynamics

When forecasting in physical or biological networks, physics-informed Gaussian Process Regression (PhI-GPR) exploits the governing stochastic differential equations (SDEs) to derive joint priors for all state variables, observed or unobserved (Tipireddy et al., 2018, Ma et al., 2020):

Power-grid swing equations with Ornstein-Uhlenbeck wind power fluctuations are solved by Monte Carlo to generate prior means and covariances.
The GP posterior for unobserved nodes follows standard conditioning: $\hat{Y}^f = \bar{Y}^f + K_{X^oY^f}^\top [K_{X^oX^o}+\sigma^2_n I]^{-1}(X^o-\bar{X}^o)$
Predictive uncertainty is quantified via the posterior covariance. Accuracy is controlled by input correlation and relaxation times.

PhI-GPR substantially extends forecast horizons and resilience to measurement sparsity and noise compared to data-driven GPR and ARIMA (Tipireddy et al., 2018, Ma et al., 2020). Its methodology generalizes to any coupled ODE/SDE network with partially measured variables.

5. Data-Driven Observers and Functional State Estimation

The functional observer framework enables direct estimation of targeted state vectors $z(t) = F x(t)$ without model identification, provided a rank-based functional observability criterion holds (Zhang et al., 7 Dec 2025):

Offline input-output trajectories are assembled in block Hankel form; design matrix $D$ and output block $Z_f$ admit least-squares estimator $\Sigma$ such that $Z_f = \Sigma D$ .
Online, the functional observer updates recursively: $\hat{z}(t+1) = \Sigma_{U_p} u(t) + \Sigma_{Y_p} y(t) + \Sigma_{Y_f} y(t+1) + \Sigma_{Z_p}\hat{z}(t)$
Extensions employ Koopman embeddings for nonlinear systems, with data-driven Hankel construction in the lifted space.

Noise mitigation is addressed by correcting empirical covariances and SVD truncation. The approach reaches $O(10^{-4})$ RRMSE in practical control (water, power grid, neuronal systems), matching model-based observers without explicit parameter identification (Zhang et al., 7 Dec 2025).

6. Deep Latent and Autoencoder Models for Feature Imputation

Node attribute generation frameworks (NANG) and graph feature autoencoders learn shared latent representations that jointly encode structure and (incomplete) attributes (Chen et al., 2019, Hasibi et al., 2020):

NANG encodes attributes ( $G_X$ ) and structure ( $G_A$ ) to a shared latent $Z$ , employs cross-modal adversarial losses, and decodes $Z$ to generate unobserved attributes.
Autoencoder designs stack graph convolutions on $X_\mathrm{obs}$ and $A$ , decode predicted features for $X_\mathrm{miss}$ , and minimize only the reconstruction loss on observed entries.
Temporal extensions involve spatio-temporal encoders (e.g., GCN+LSTM), dynamical priors $p(Z^{(t)}|Z^{(t-1)})$ , and adversarial alignment of time-evolved latent codes (Chen et al., 2019).
Graph feature autoencoders outperform MLP and diffusion-based imputers (MAGIC) with lower MSE in omics and neuroscience data, especially under random and fixed-node holdout regimes (Hasibi et al., 2020).

7. Advanced Architectures and Uncertainty Quantification

Recent models such as DynaSTy (Banerji et al., 8 Jan 2026) and particle-flow RNNs (Pal et al., 2021) provide further extensions:

DynaSTy: Dynamic graphs with adaptive, edge-biased transformer encoders, masked node-time pretraining, scheduled sampling, and horizon-weighted losses. Consistent performance gains in MAE/RMSE are observed across financial, traffic, and neuroimaging datasets; ablation indicates strong contributions from edge biasing and pretraining.

Particle-Flow RNNs: Bayesian inference for nonlinear spatio-temporal state-space models is achieved by local particle-flow ODEs, mitigating weight degeneracy in high-dimensional latent spaces. Graph-convolutional GRU parameterizes the state transitions, enabling multi-step probabilistic forecasts on arbitrary networks with missing data. Proper scoring rules (CRPS, quantile loss) evaluate predictive distributions and interval coverage (Pal et al., 2021).

In summary, FUNS formalizes the general problem of forecasting node states in graphs where the majority of nodes possess no temporal training data. Solutions integrate graph structure, temporal augmentation, physics or domain priors, and advanced neural or Bayesian architectures, with demonstrable empirical performance and theoretical robustness across synthetic, sensor, biological, and industrial datasets (Lei et al., 2024, Roth et al., 2022, Zhang et al., 7 Dec 2025, Tipireddy et al., 2018, Ma et al., 2020, Chen et al., 2019, Hasibi et al., 2020, Banerji et al., 8 Jan 2026, Pal et al., 2021, Bravi et al., 2016).