SurGe: Diverse Surrogate Frameworks

Updated 4 July 2026

SurGe is a reused research label applied to diverse surrogate systems across coastal hazard prediction, 3D reconstruction, robotics co-design, diffusion-based inference, and sharing economy applications.
In coastal modelling, SurGe frameworks employ two-stage architectures with classifiers and regressors to emulate ADCIRC, achieving high accuracy (e.g., RMSE ~0.307–0.59 m) and significant computational speed-ups.
Other SurGe variants utilize specialized techniques such as point map normal metrics, surrogate gradient guidance, and particle-filtering to enhance performance in their respective, domain-specific tasks.

SurGe is a reused research label rather than a single canonical framework. In arXiv literature it most often denotes storm-surge surrogate modelling, where machine-learning systems approximate high-fidelity coastal hydrodynamic simulation for peak or time-dependent inundation prediction. The same label, with capitalization variants such as SurGe, SurGE, and SURGE, also denotes a monocular 3D reconstruction model for point maps, a hybrid evolutionary method for legged-robot co-design, a path-space particle-filtering method for diffusion surrogates, and a sharing-economy resource-allocation scheme (Pachev et al., 2022, Pachev et al., 26 Mar 2026, Knaebel et al., 29 May 2026, Zhuang et al., 20 Jun 2026, Wei et al., 18 May 2026, Mohamadzadehoqaz et al., 31 Mar 2025). This multiplicity makes disambiguation essential: in coastal modelling, SurGe usually refers to surrogate storm-surge prediction; in other fields it is an unrelated acronym.

1. Terminological scope and naming variants

The label has been used for several technically unrelated systems.

Domain	Expansion or use	Representative source
Storm-surge modelling	Flexible surrogate framework for peak storm surge prediction	(Pachev et al., 2022)
Global coastal ML	Global, location-invariant peak storm surge prediction	(Pachev et al., 26 Mar 2026)
Monocular 3D reconstruction	“Improved Surface Geometry in Point Maps”	(Knaebel et al., 29 May 2026)
Legged-robot co-design	“Surrogate Gradient-guided Evolution”	(Zhuang et al., 20 Jun 2026)
Diffusion-based data assimilation	“Sequential Unbiased Resampling via Girsanov Estimation”	(Wei et al., 18 May 2026)
Sharing economy	“surge sourcing via hybrid supply”	(Mohamadzadehoqaz et al., 31 Mar 2025)

A common misconception is that SurGe denotes one cross-domain methodology. The literature instead shows a naming collision. The strongest concentration of usage is in coastal hazard modelling, where multiple papers develop surrogate models intended to replace or augment expensive storm-surge solvers such as ADCIRC. Elsewhere the acronym is field-specific and should be interpreted only within its local disciplinary context.

2. Peak storm-surge surrogate modelling

In coastal ML, SurGe was introduced as a flexible surrogate framework for peak storm surge prediction. The 2022 framework models maximum storm surge rather than the full time series, defining peak surge as

$\eta^{\max}_s(\mathbf{x})=\max_t \eta_s(\mathbf{x}, t),$

and then learning a pointwise surrogate formulation in which each training example is a storm-location pair $(s,\mathbf{x}_i)$ mapped to a single scalar $\eta(\mathbf{x}_i)$ (Pachev et al., 2022). This reformulation reduces the output dimension to 1, reduces the number of required parameters, increases the number of training examples, and allows predictions at locations not seen in training.

The framework is explicitly two-stage. Stage 1 performs inundation classification as a wet/dry binary task. Stage 2 predicts inundation level only for points classified as wet. The implementation evaluates both feed-forward neural networks and XGBoost. For the neural models, the paper defines three ReLU architectures: nn1 with 3 hidden layers $(256, 512, 256)$ , nn2 with 5 hidden layers $(256, 512, 1024, 512, 256)$ , and nn3 with 7 hidden layers $(256, 512, 1024, 2048, 1024, 512, 256)$ . Classification uses sigmoid output and binary cross entropy; regression uses ReLU output and mean squared error.

Feature construction is spatially informed rather than purely track-based. For Texas, the feature set includes landfall time and location, forcing statistics from 6 hours before to 6 hours after landfall, wind $x$ , wind $y$ , wind magnitude, pressure statistics, spatial neighborhood statistics, bathymetry statistics, distance to landfall, distance to coastline, and raw bathymetry, for a total of 135 features. For Alaska, where events are more heterogeneous and not limited to tropical cyclones, the pipeline uses wind, pressure, sea ice concentration, spatial neighborhood statistics, and tidal harmonic amplitudes for $M2, S2, N2, K2, O1, K1, P1, Q1$ , yielding 172 features.

The reported Texas dataset contains 446 synthetic hurricane events on an ADCIRC mesh with 3,352,598 nodes and 6,675,517 triangular elements. The Alaska dataset contains 109 historical surge events. In Texas, the best classifier-regressor pair is nn3 classifier + nn3 regressor, with $R^2 = 0.873$ and RMSE = 0.307 m; the worst combination, XGBoost + XGBoost, yields $(s,\mathbf{x}_i)$ 0 and RMSE = 0.355 m. On real storms, the surrogate obtains $(s,\mathbf{x}_i)$ 1 and RMSE = 0.57 m for Hurricane Ike (2008), and $(s,\mathbf{x}_i)$ 2 and RMSE = 0.47 m for Hurricane Harvey (2017). For Typhoon Merbok (2022) in Alaska, the model reports $(s,\mathbf{x}_i)$ 3 and RMSE = 0.59 m. A central practical claim is computational efficiency: in the Texas validation, a single ADCIRC run for a real hurricane used 740 CPUs and about two hours, whereas the surrogate evaluates in seconds on a single CPU (Pachev et al., 2022).

A later global formulation extends peak-surge modelling beyond basin-specific systems. That model is trained on over 15,000 landfalling synthetic storms distributed across the North Atlantic, East Pacific, North Indian, South Indian, West Pacific, and South Pacific and uses ADCIRC on a global unstructured mesh with 12.8 million nodes and 24.9 million triangular elements (Pachev et al., 26 Mar 2026). Inputs and outputs are interpolated onto a 2.5° × 2.5° regular grid of 128 × 128 cells centered on landfall. Meteorological forcing is sampled every 3 hours from 24 hours before landfall to 12 hours after landfall, producing 13 time steps. With three forcing channels per step plus bathymetry and land mask, the input tensor is

$(s,\mathbf{x}_i)$ 4

with $(s,\mathbf{x}_i)$ 5 and $(s,\mathbf{x}_i)$ 6, and the network learns

$(s,\mathbf{x}_i)$ 7

The main architecture is a 5-stage UNet (“UNet-5”). Reported RMSE values are 0.51 near land and 0.33 over all points, outperforming CNN, UNet-4, and SegNet baselines. The global model also outperforms local basin-specific models in every basin in the reported table, with improvements of about 10% in North Atlantic and 28% in North Indian. On 67 historical North Atlantic hurricanes from 2003–2023, with 188 gauge points for 30 storms, the observational comparison gives RMSE 0.22 m for ADCIRC and 0.37 m for UNet-5; when weighting each storm equally, the values are 0.48 m and 0.52 m, respectively (Pachev et al., 26 Mar 2026).

3. Spatio-temporal surge forecasting and sequence modelling

Peak-surge SurGe frameworks deliberately discard temporal evolution. A separate but closely related line of work addresses this limitation by learning the full surge time series. One hierarchical approach combines a Convolutional Autoencoder (CAE) with Hierarchical Deep Neural Networks (HDNNs) to compress high-dimensional surge fields and forecast them across multiple time scales (Naeini et al., 2024). The CAE reshapes the surge field into a 24 × 24 × 1 matrix representing 578 save points and compresses it to a 4-dimensional latent vector using four Conv2D layers with filters 16, 32, 64, 128, max-pooling, flattening, and a dense layer of 4 neurons. Reconstruction performance is reported as Training RMSE: 0.017, Validation RMSE: 0.022, and Testing RMSE: 0.026.

The HDNN component trains 7 neural networks with latent-space step sizes

$(s,\mathbf{x}_i)$ 8

each with an input layer of 8 nodes, three hidden layers of 512 neurons, and an output layer of 4 nodes. The network with the largest time step predicts coarse future states first, and finer networks then fill intermediate states. The final system achieves RMSE = 0.050 and MAE = 0.023 on the testing set. In selected reconstructions, absolute differences are generally below 0.10 m, with only a few isolated peak locations reaching about 9% error. This suggests a shift from peak-only emulation toward long-horizon, high-dimensional spatio-temporal forecasting (Naeini et al., 2024).

A second sequence model, also named SurGe, recasts storm-surge forecasting as a video prediction problem on structured image grids (Zhao et al., 26 Jun 2025). The source data are 446 tropical cyclone simulations from the Texas FEMA synthetic storm-surge dataset, generated with ADCIRC on a mesh with 3,352,598 nodes and 6,675,517 triangular elements. Each simulation spans 5 days, with 60 output time steps at 2-hour intervals. Unstructured water elevation fields are rasterized onto $(s,\mathbf{x}_i)$ 9 grids and encoded as 3-channel RGB images. The value range for water elevation is

$\eta(\mathbf{x}_i)$ 0

chosen using the 80th percentile of the dataset’s surge distribution; wind and bathymetry channels are similarly normalized.

The forecasting model uses 3 stacked ConvLSTM layers with hidden channels $\eta(\mathbf{x}_i)$ 1 and a $\eta(\mathbf{x}_i)$ 2 convolutional decoder. Each input frame has 6 channels: 3 for RGB-encoded water elevation, 2 for wind velocity components, and 1 for static bathymetry. The autoregressive input is

$\eta(\mathbf{x}_i)$ 3

Clips are centered on the peak-surge frame, with 6 past frames used as context and 24 future frames predicted, corresponding to a 48-hour forecast horizon. The storm-level split yields 8,433 train/validation clips and 1,034 test clips.

Reported median $\eta(\mathbf{x}_i)$ 4 values are roughly 0.945–0.989 for Galveston Bay, 0.987–0.997 for Corpus Christi, 0.979–0.995 for South Padre Island, and 0.969–0.994 for the combined model. Representative median RMSE values are about 0.035 for the hardest Galveston cases, 0.024 for Corpus Christi, 0.032 for South Padre, and 0.032 for the combined model. Out-of-distribution tests on Matagorda Bay and Baffin Bay retain median $\eta(\mathbf{x}_i)$ 5 above 0.8 and 0.9, respectively. A reported limitation is extreme-value extrapolation: on Hurricane Ike (2008) in Galveston Bay, the model underpredicts strongly because observed surge exceeded the training range, with many values above 3 m and up to 5 m, while the normalization capped the training range at 2.5 m (Zhao et al., 26 Jun 2025).

4. SurGe in monocular 3D reconstruction

Outside coastal modelling, SurGe denotes a feedforward monocular 3D reconstruction model centered on point maps, in which each pixel is assigned a 3D point in camera coordinates (Knaebel et al., 29 May 2026). The stated motivation is that recent point-map models predict global 3D geometry well but still exhibit inaccurate local surface geometry, including ripples, blockiness, bent thin structures, and oscillatory surfaces, and that these artifacts are only weakly reflected in common pointwise metrics.

To expose such errors, the model introduces a point map normal metric based on local surface orientation induced by neighboring 3D predictions: $\eta(\mathbf{x}_i)$ 6 Normals are estimated from neighboring point differences and local cross products. The model also introduces a point gradient matching loss $\eta(\mathbf{x}_i)$ 7, which supervises depth-normalized 3D finite differences rather than scalar depth gradients alone. This loss is combined with global and local point-map losses as

$\eta(\mathbf{x}_i)$ 8

Architecturally, the system retains a DINOv2 ViT-Large encoder and replaces conventional decoders with a Neighborhood Attention Decoder (NAD). NAD progressively upsamples features through five stages, with channel widths

$\eta(\mathbf{x}_i)$ 9

and uses Neighborhood Attention with local neighborhood size $(256, 512, 256)$ 0. The output parameterization predicts $(256, 512, 256)$ 1 and maps it to 3D as

$(256, 512, 256)$ 2

The evaluation is zero-shot on eight benchmarks: NYUv2, KITTI, ETH3D, iBims-1, GSO, Sintel, DDAD, DIODE. SurGe achieves the best average rank on the global point-map evaluation and is consistently best on the local point-map and point-map-normal metrics. Reported local point-map AbsRel values include 2.66 on ETH3D, 3.33 on iBims-1, 7.66 on Sintel, 6.29 on DDAD, and 4.63 on DIODE. Reported point-map normal angular errors include 18.3° on ETH3D, 16.5° on iBims-1, 10.5° on GSO, 24.5° on Sintel, and 12.0° on DIODE. Global affine-invariant AbsRel values include 3.31 on NYUv2, 4.80 on KITTI, 3.51 on ETH3D, 3.31 on iBims-1, 1.11 on GSO, 17.1 on Sintel, 9.05 on DDAD, and 4.88 on DIODE (Knaebel et al., 29 May 2026).

The explicit limitation is efficiency. NAD is reported to be about 1.30× slower than ConvStack-L with DINOv2-giant and 1.46× slower with DINOv2-Large, with a modest memory increase. The model is therefore positioned as an accuracy-oriented decoder rather than an efficiency-oriented one.

5. SurGE and SURGE in optimization, co-design, and diffusion-based inference

In robotics, SurGE stands for Surrogate Gradient-guided Evolution, a hybrid co-design method for legged robots with elastic mechanisms (Zhuang et al., 20 Jun 2026). The target setting is non-differentiable co-design under contact dynamics and mechanism engagement. The method combines CMA-ES with gradients computed through a differentiable surrogate pipeline consisting of a kinodynamic single-rigid-body (Kino-SRB) model and a design-aware control policy. The search distribution is a Gaussian

$(256, 512, 256)$ 3

and surrogate gradients are injected by a covariance-preconditioned mean shift,

$(256, 512, 256)$ 4

followed by cosine-annealed scaling

$(256, 512, 256)$ 5

The reported application is a 4-DOF design space of a hopping robot with a unidirectional parallel spring. Design variables are spring stiffness $(256, 512, 256)$ 6 N/m, spring engagement length $(256, 512, 256)$ 7 m, rocker length $(256, 512, 256)$ 8 m, and crank length $(256, 512, 256)$ 9 m. Relative to vanilla CMA-ES, SurGE achieves 6 times lower cross-seed standard deviation and 18% tighter population concentration, while matching or improving the best objective. In hardware experiments on a 2D design subspace, starting from a hand-tuned design, the method reduces the design objective by 37.65% on hardware, and the improvement trend observed in simulation transfers consistently to the physical system (Zhuang et al., 20 Jun 2026).

A different all-caps variant, SURGE, is a training-free, approximation-free particle-filtering framework for data assimilation with diffusion-based surrogate models (Wei et al., 18 May 2026). Its central idea is to treat a guided diffusion sampler as a proposal distribution on path space, then correct proposal bias exactly with Girsanov importance weights and sequential Monte Carlo (SMC) resampling. The filtering target is

$(256, 512, 1024, 512, 256)$ 0

and the key claim is that guidance is proposal design rather than exact posterior sampling. The path-space importance weight for a guided trajectory is written as

$(256, 512, 1024, 512, 256)$ 1

The method avoids score, Hessian, and PDE evaluation. Empirically it improves multiple backbones. On Lorenz 1963, FlowDAS improves from RMSE 0.0545 to 0.0502 and $(256, 512, 1024, 512, 256)$ 2 from 0.0388 to 0.0363 when augmented with SURGE. On forced incompressible Navier–Stokes, in super-resolution, FlowDAS improves from KES-RE 0.401 and RMSE 1.018 to 0.317 and 0.851; in sparse recovery it improves from KES-RE 0.543 and RMSE 0.872 to 0.278 and 0.673. On SEVIR weather forecasting, FlowDAS improves from RMSE 0.0657, CSI@20 0.5779, CSI@40 0.4044 to RMSE 0.0513, CSI@20 0.6197, CSI@40 0.4541 (Wei et al., 18 May 2026).

These two variants share a structural motif: a biased or approximate differentiable object is not taken as a replacement for the underlying non-differentiable or posterior-correct objective, but as a mechanism for better search or proposal construction. That interpretation is inferential; the concrete algorithms remain unrelated.

6. SurGe as surge sourcing via hybrid supply

In the sharing-economy literature, SurGe stands for surge sourcing via hybrid supply, a scheme for handling synchronous peak demand without surge pricing (Mohamadzadehoqaz et al., 31 Mar 2025). The system partitions resources into a main shared pool $(256, 512, 1024, 512, 256)$ 3, an auxiliary prosumer-supplied pool $(256, 512, 1024, 512, 256)$ 4, and a reserve $(256, 512, 1024, 512, 256)$ 5 carved out of the main pool to protect prosumers if their own resources become unavailable. During a surge, consumers can access $(256, 512, 1024, 512, 256)$ 6 items, while prosumers retain access to $(256, 512, 1024, 512, 256)$ 7 reserved items.

Demand is modelled through three Bernoulli/binomial scenarios: non-surge $(256, 512, 1024, 512, 256)$ 8, surge $(256, 512, 1024, 512, 256)$ 9, and bad-behavior / contingency demand $(256, 512, 1024, 2048, 1024, 512, 256)$ 0. If $(256, 512, 1024, 2048, 1024, 512, 256)$ 1, quality of service is defined as

$(256, 512, 1024, 2048, 1024, 512, 256)$ 2

with $(256, 512, 1024, 2048, 1024, 512, 256)$ 3, $(256, 512, 1024, 2048, 1024, 512, 256)$ 4, and $(256, 512, 1024, 2048, 1024, 512, 256)$ 5. The reserve is claimed to remain small because $(256, 512, 1024, 2048, 1024, 512, 256)$ 6 is assumed much smaller than $(256, 512, 1024, 2048, 1024, 512, 256)$ 7, leading under a normal approximation to

$(256, 512, 1024, 2048, 1024, 512, 256)$ 8

The paper formulates two optimization problems: a minimum-cost design over $(256, 512, 1024, 2048, 1024, 512, 256)$ 9 under QoS constraints, and a best-effort partitioning problem with fixed $x$ 0 and optimized $x$ 1. The first uses the cost form

$x$ 2

and is solved with SLSQP. A distributed AIMD algorithm is also proposed for privacy-preserving allocation.

Two applications are given. For high-range car sharing for owners of small EVs, the QoS parameters are $x$ 3, $x$ 4, and $x$ 5; example optimized reserves are $x$ 6 or $x$ 7 for $x$ 8, and $x$ 9–126 for $y$ 0. For shared charging points for EV drivers, the estimated parameters are $y$ 1, $y$ 2, and a deposit-reduced $y$ 3; example reserves are $y$ 4 for $y$ 5 and $y$ 6 for $y$ 7. The stated conclusion is that hybrid supply can maintain high QoS for both consumers and prosumers without surge pricing and with a relatively small reserve (Mohamadzadehoqaz et al., 31 Mar 2025).

7. Conceptual patterns and disambiguation issues

Across fields, SurGe-labelled systems are generally surrogate-oriented or augmentation-oriented rather than direct replacements for the full underlying process. In coastal modelling, SurGe approximates ADCIRC for peak or time-dependent surge prediction (Pachev et al., 2022, Pachev et al., 26 Mar 2026, Zhao et al., 26 Jun 2025). In monocular 3D reconstruction, it augments point-map training and evaluation with surface-aware metrics and losses (Knaebel et al., 29 May 2026). In robotics, it augments evolutionary search with surrogate gradients (Zhuang et al., 20 Jun 2026). In diffusion-based inference, it augments guided proposals with exact path-space importance correction (Wei et al., 18 May 2026). In resource allocation, it augments a primary pool with prosumer supply and reserve protection (Mohamadzadehoqaz et al., 31 Mar 2025).

Several additional misconceptions arise from the reuse of the name. First, storm-surge SurGe is not always spatio-temporal: the 2022 framework predicts peak surge only, whereas later work on HDNNs and RGB-encoded ConvLSTM forecasting addresses the full time series (Pachev et al., 2022, Naeini et al., 2024, Zhao et al., 26 Jun 2025). Second, global generalization is not uniform across implementations: pointwise SurGe attains flexibility by predicting at storm-location pairs, whereas the global location-invariant model attains flexibility by centering a regular lat-lon grid on landfall and learning spatial fields with UNet (Pachev et al., 2022, Pachev et al., 26 Mar 2026). Third, accuracy claims are domain-specific: the best available metrics differ fundamentally across coastal forecasting, point-map reconstruction, robot co-design, and sequential Monte Carlo, so numerical comparisons across these SurGe variants are not meaningful.

The term therefore functions best as a disambiguated label tied to a paper, a field, and a capitalization pattern. In present arXiv usage, the dominant technical association is storm-surge surrogate modelling, but the acronym’s broader trajectory shows how the same short label has been independently repurposed for distinct problems in geophysical ML, computer vision, robotics, data assimilation, and sharing-economy systems.