Papers
Topics
Authors
Recent
Search
2000 character limit reached

TrajGRU: Dynamic Nowcasting Model

Updated 26 February 2026
  • TrajGRU is a deep learning model that dynamically learns sparse, location-variant trajectories to capture complex motion patterns in spatiotemporal data.
  • The model employs an encoding–forecasting framework with a structure generator that predicts offsets for differentiable bilinear warping of hidden states.
  • TrajGRU demonstrates superior performance over conventional ConvGRU by achieving a better trade-off between parameter efficiency and nowcasting skill through balanced loss training and online adaptation.

The Trajectory Gated Recurrent Unit (TrajGRU) is a deep recurrent neural network architecture designed to address location-variant spatiotemporal process modeling, with its most prominent application to high-resolution precipitation nowcasting using radar echo maps. TrajGRU extends the convolutional Gated Recurrent Unit (ConvGRU) by dynamically learning sparse, location-dependent recurrent connections—"trajectories"—thereby enabling the network to better represent complex motion patterns such as rotation and scaling, which are ubiquitous in meteorological data but not well-captured by location-invariant convolutional recurrence.

1. Problem Formulation and Model Context

Precipitation nowcasting, the short-range prediction of rainfall intensity based on recent radar echoes, is formulated as a spatiotemporal sequence prediction problem. A radar volume scan is represented as a sequence {It}\{\mathcal I_t\} of CAPPI echo maps. Given the JJ most recent frames, ItJ+1,,It\mathcal I_{t-J+1},\dots,\mathcal I_t, the task is to predict the subsequent KK frames, I^t+1,,I^t+K\hat{\mathcal I}_{t+1},\dots,\hat{\mathcal I}_{t+K}. TrajGRU is implemented within an encoding–forecasting framework comprising nn stacked recurrent layers (RNNs):

  • Encoder: Processes the JJ input frames to produce a hierarchy of hidden states, Ht1,,Htn\mathcal H_t^1,\dots,\mathcal H_t^n, at different spatial scales enabled by downsampling and upsampling operations.
  • Forecaster: Unfolds these hidden states temporally to generate the future frames.

This architecture generalizes previous models such as ConvLSTM and ConvGRU, which perform state-to-state updates via location-invariant convolutions, a limitation for capturing natural motion present in atmospheric sequences (Shi et al., 2017).

2. ConvGRU Recurrence and Limitations

In the ConvGRU model, the evolution of the hidden state Ht\mathcal H_t is determined by 2D convolutional operations applied identically at all spatial locations. The gates and update equations are:

Zt=σ(WxzXt+WhzHt1), Rt=σ(WxrXt+WhrHt1), Ht=f(WxhXt+Rt(WhhHt1)), Ht=(1Zt)Ht+ZtHt1,\begin{aligned} \mathcal Z_t &= \sigma\bigl(\mathcal W_{xz}\ast\mathcal X_t + \mathcal W_{hz}\ast\mathcal H_{t-1}\bigr),\ \mathcal R_t &= \sigma\bigl(\mathcal W_{xr}\ast\mathcal X_t + \mathcal W_{hr}\ast\mathcal H_{t-1}\bigr),\ \mathcal H'_t&= f\bigl(\mathcal W_{xh}\ast\mathcal X_t + \mathcal R_t\circ(\mathcal W_{hh}\ast\mathcal H_{t-1})\bigr),\ \mathcal H_t &= (1-\mathcal Z_t)\circ\mathcal H'_t + \mathcal Z_t\circ\mathcal H_{t-1}, \end{aligned}

where \ast denotes 2D convolution, \circ is element-wise multiplication, σ\sigma is the sigmoid function, and ff is typically leaky ReLU. Both input–hidden and hidden–hidden transformations are spatially invariant. This design prevents the model from adapting its connections to local motions and deformations present in phenomena such as precipitation (Shi et al., 2017).

3. TrajGRU Architecture: Location-Variant Recurrence

TrajGRU generalizes ConvGRU by replacing the fixed hidden–hidden convolution WhhHt1\mathcal W_{hh}\ast\mathcal H_{t-1} with a sparse, learned sampling of Ht1\mathcal H_{t-1} at dynamically predicted offsets—trajectories—specific to each spatial position and timestep. The structure generator γ\gamma (a lightweight two-layer convolutional network) computes LL offset fields at each location:

(Ut,Vt)=γ(Xt,Ht1),(\mathcal U_t,\mathcal V_t) = \gamma(\mathcal X_t,\,\mathcal H_{t-1}),

with Ut,VtRL×H×W\mathcal U_t, \mathcal V_t \in \mathbb{R}^{L \times H \times W}. For each trajectory l{1,,L}l \in \{1,\dots,L\}, the previous state is sampled at offset locations via differentiable bilinear warping:

warp(Ht1,Ut,l,Vt,l)=Ht1(x+Vt,l,y+Ut,l).\mathrm{warp}(\mathcal H_{t-1},\,\mathcal U_{t,l},\,\mathcal V_{t,l}) = \mathcal H_{t-1}(x+\mathcal V_{t,l},\,y+\mathcal U_{t,l}).

The updated gates and candidate state are then constructed as follows:

Zt=σ(WxzXt+l=1LWhzlwarp(Ht1,Ut,l,Vt,l)), Rt=σ(WxrXt+l=1LWhrlwarp(Ht1,Ut,l,Vt,l)), Ht=f(WxhXt+Rt(l=1LWhhlwarp(Ht1,Ut,l,Vt,l))), Ht=(1Zt)Ht+ZtHt1.\begin{aligned} \mathcal Z_t &= \sigma\left(\mathcal W_{xz}\ast\mathcal X_t + \sum_{l=1}^{L}\mathcal W_{hz}^l \ast \mathrm{warp}(\mathcal H_{t-1}, \mathcal U_{t,l}, \mathcal V_{t,l})\right),\ \mathcal R_t &= \sigma\left(\mathcal W_{xr}\ast\mathcal X_t + \sum_{l=1}^{L}\mathcal W_{hr}^l \ast \mathrm{warp}(\mathcal H_{t-1}, \mathcal U_{t,l}, \mathcal V_{t,l})\right),\ \mathcal H'_t &= f\left(\mathcal W_{xh}\ast\mathcal X_t + \mathcal R_t \circ \left(\sum_{l=1}^{L}\mathcal W_{hh}^l \ast \mathrm{warp}(\mathcal H_{t-1}, \mathcal U_{t,l}, \mathcal V_{t,l})\right)\right),\ \mathcal H_t &= (1-\mathcal Z_t)\circ\mathcal H'_t + \mathcal Z_t\circ\mathcal H_{t-1}. \end{aligned}

Key architectural elements include the use of 1×11 \times 1 convolutions for all W\mathcal W parameters associated to each trajectory, with LL typically much smaller than a full K×KK\times K ConvGRU kernel, and the structure generator γ\gamma parameterized as a two-layer convolutional network (e.g., 32 channels, 5×55\times5 kernels). This approach enables TrajGRU to adaptively align its recurrent connectivity structure with location-variant scene dynamics (Shi et al., 2017).

4. Training Loss and Objective Function

Effective training of precipitation nowcasting models requires accounting for the inherent class imbalance in rainfall rates. Light rainfall is frequent, but heavy rainfall, though rarer, is more critical operationally. TrajGRU introduces a pixel-wise, value-dependent weighting scheme:

w(x)={1x<2, 22x<5, 55x<10, 1010x<30, 30x30,w(x) = \begin{cases} 1 & x < 2, \ 2 & 2 \le x < 5, \ 5 & 5 \le x < 10, \ 10 & 10 \le x < 30, \ 30 & x \ge 30, \end{cases}

where xx is rain rate in mm/h; pixels masked due to missing data or identified as outliers are assigned w=0w=0. The loss over NN frames of size H×WH\times W averages two balanced regression objectives: B-MSE=1Nn,i,jwn,i,j(xn,i,jx^n,i,j)2,B-MAE=1Nn,i,jwn,i,jxn,i,jx^n,i,j.\mathrm{B\text{-}MSE} = \frac{1}{N}\sum_{n,i,j} w_{n,i,j} (x_{n,i,j}-\hat x_{n,i,j})^2, \qquad \mathrm{B\text{-}MAE} = \frac{1}{N}\sum_{n,i,j} w_{n,i,j} \lvert x_{n,i,j}-\hat x_{n,i,j}\rvert. The final offline loss is Loss=B-MSE+B-MAE\mathrm{Loss} = \mathrm{B\text{-}MSE} + \mathrm{B\text{-}MAE}. For online adaptation (incremental fine-tuning), the same objective is used, optimized via, e.g., AdaGrad (Shi et al., 2017).

5. HKO-7 Radar Benchmark and Preprocessing

The evaluation of TrajGRU and baselines employs the HKO-7 benchmark, constructed from Hong Kong Observatory CAPPI reflectivity (dBZ) radar mosaics at 2 km altitude, spatial coverage 512 km×\times512 km (480×\times480 pixels), every 6 minutes. The dataset spans 2009–2015 with the following splits:

Subset Years Days Frames
Training 2009–2014 812 ~192,168
Validation 2009–2014 50 ~11,736
Testing 2015 131 ~31,350

Preprocessing involves outlier masking (e.g., ground clutter, sun spikes) using Mahalanobis distance on per-pixel reflectivity histograms, then discarding reflectivity values outside [1,70][1,70] dBZ. Rain rates RR (mm/h) are mapped from dBZ via the Marshall-Palmer relationship: dBZ=10log10(58.53)+1.56log10(R).\mathrm{dBZ} = 10\log_{10}(58.53) + 1.56\log_{10}(R). (Shi et al., 2017)

6. Evaluation Protocol, Metrics, and Benchmarks

Evaluation is conducted under two scenarios:

  • Offline: Each test sequence comprises the latest 5 input frames (J=5J=5), with the model asked to predict the next 20 frames (K=20K=20) with no adaptation.
  • Online: The system receives consecutive 5-frame segments; after each, fine-tuning on a buffer of recent frames (e.g., 25) is permitted prior to making the next prediction.

The protocol employs both continuous and categorical metrics:

  • B-MSE, B-MAE: Balanced mean squared/absolute error as described above.
  • Categorical skill scores: At rainfall thresholds τ{0.5,2,5,10,30}\tau\in\{0.5,2,5,10,30\} mm/h, binarization at each pixel yields counts of TP, FP, FN, and TN. Metrics include:
    • Critical Success Index (CSI): CSI=TPTP+FN+FP\displaystyle \mathrm{CSI} = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}+\mathrm{FP}}
    • Heidke Skill Score (HSS): HSS=TPTNFPFN(TP+FN)(FN+TN)+(TP+FP)(FP+TN)\displaystyle \mathrm{HSS} = \frac{\mathrm{TP}\,\mathrm{TN} - \mathrm{FP}\,\mathrm{FN}} {(\mathrm{TP}+\mathrm{FN})(\mathrm{FN}+\mathrm{TN}) + (\mathrm{TP}+\mathrm{FP})(\mathrm{FP}+\mathrm{TN})}

The suite comprises persistence (last observation) baseline, two optical-flow methods (ROVER, linear and nonlinear), 2D and 3D CNNs, ConvGRU (with and without balanced losses), and TrajGRU.

Key empirical findings include:

  • Usage of balanced losses is essential to achieve high skill at infrequent but operationally important heavy-rain thresholds (10/30 mm/h); ConvGRU trained with standard MSE/MAE can underperform optical-flow-based baselines in these cases.
  • All deep models trained with balanced losses surpass optical-flow baselines across metrics.
  • TrajGRU attains the best trade-off between parameter efficiency and skill, with statistically significant gains over ConvGRU.
  • Online adaptation (fine-tuning) consistently enhances both CSI and HSS at all thresholds. (Shi et al., 2017)

7. Significance and Implications

TrajGRU provides an explicit mechanism for adaptively warping hidden states, thereby capturing complex, location-variant dynamics in meteorological data—capabilities inaccessible to previous location-invariant recurrent architectures. Its demonstration on the HKO-7 benchmark establishes both a state-of-the-art approach for radar-based precipitation nowcasting and a standardized evaluation pipeline, including dataset, loss formulation, and metrics, that form a foundation for subsequent research in deep learning-based spatiotemporal prediction. The experimental evidence substantiates the necessity of both location-variant model components and balanced training objectives for optimal skill in high-impact, imbalanced forecasting tasks (Shi et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Trajectory GRU Model (TrajGRU).