Geometry-Aware Temporal Fusion: TSSTF Model

Updated 27 September 2025

The paper introduces TSSTF, a model that uses geometry-driven regularizers to integrate spatial detail with temporal dynamics for robust edge preservation.
It employs TGTV and TGEC mechanisms within a constrained convex optimization framework, effectively mitigating noise and spectral variations.
Quantitative evaluations demonstrate TSSTF’s superior performance, maintaining fine structures and improving metrics like PSNR and MSSIM over state-of-the-art methods.

A geometry-aware temporal fusion scheme refers to algorithmic frameworks that integrate information across both spatial and temporal domains, while explicitly modeling or enforcing geometric structure. In remote sensing, these methods aim to jointly leverage the spatial detail and temporal dynamics across multi-temporal images, ensuring the preservation and robust transfer of geometric (or structural) information despite noise, spectral variation, and measurement degradation. A leading example is the Temporally-Similar Structure-Aware Spatiotemporal Fusion (TSSTF) model, developed specifically for satellite imagery, which employs geometry-driven constraints and regularizations to produce high-fidelity, edge-preserving fused outputs even under challenging noise conditions (Isono et al., 15 Aug 2025).

1. Geometry-Aware Regularization and Constraints

At the foundation of TSSTF are two mechanisms: Temporally-Guided Total Variation (TGTV) and Temporally-Guided Edge Constraint (TGEC), both of which encapsulate geometry-awareness in the temporal fusion process.

TGTV defines an anisotropic regularization function that promotes piecewise smoothness of the reconstructed images but adaptively preserves spatial structures based on guidance from a high-resolution (HR) reference image acquired at a nearby date. For each pixel, TGTV uses directional weights computed as

$w_{i,j}^{(p)} = \exp\left( -\frac{1}{\delta^2} \left| [\mathcal{D}_p h']_{i,j} \right|^2 \right)$

where $h'$ is a median-filtered and channel-averaged guide image, $\delta$ is a sensitivity parameter, and $\mathcal{D}_p$ is the finite difference operator in direction $p \in \{1,2,3,4\}$ . The four spatial directions allow the regularizer to be highly responsive to local edge geometry.

TGTV regularization is then applied as a mixed $\ell_{1,2}$ -norm: $\operatorname{TGTV}(x) = \|\mathbf{W} \mathbf{D} x \|_{1,2} = \sum_{i,j} \sqrt{ \sum_{b} \sum_{p=1}^{4} |w_{i,j}^{(p)} [D_p x]_{i,j,b}|^2 }$ where $x$ is the HR image estimate, $[D_p x]_{i,j,b}$ is the finite difference at pixel $(i,j)$ in band $b$ , and $\mathbf{W}$ is the matrix of guide-derived weights.

TGEC, a geometry-driven constraint, enforces that the set of edge locations in the HR reference and HR target images remain consistent (allowing for changing edge intensities, but not topology) under the assumption that their spatial structure is temporally stable over short time intervals. Formally,

$\|\mathbf{W} (\mathbf{D} \hat{h}_r - \mathbf{D} \hat{h}_t ) \|_q \leq \alpha$

where $\hat{h}_r$ and $\hat{h}_t$ are the denoised/fused HR reference and target images, $q$ is ideally the mixed $\ell_{1,2}$ -norm, and $\alpha$ is a tunable threshold.

By tying both regularization and constraint weights directly to the geometric structure of a reference image (via spatial gradients), TSSTF ensures that geometry is not a mere by-product of fusion but is algorithmically foregrounded.

2. Spatiotemporal Optimization Framework

TSSTF frames the fusion process as a constrained convex optimization problem in which the objective includes only geometry-aware TGTV regularizers, while all other requirements (edge consistency, brightness alignment, data fidelity, noise suppression) are enforced as constraints. The core formulation is: $\begin{aligned} \min_{\hat{h}_r, \hat{h}_t, \ldots} & \hspace{3mm} \|\mathbf{W} \mathbf{D} \hat{h}_r \|_{1,2} + \lambda \|\mathbf{W} \mathbf{D} \hat{h}_t \|_{1,2} \ \text{subject to} & \left\{ \begin{array}{ll} \|\mathbf{W} ( \mathbf{D} \hat{h}_r - \mathbf{D} \hat{h}_t ) \|_q \leq \alpha & \text{(TGEC)} \ \left|\frac{1^\top [\hat{h}_t]_b}{N_h} - \frac{1^\top [l_t]_b}{N_l}\right| \leq \beta_b\ \forall b & \text{(Brightness)} \ \| h - (\hat{h}_r + s_h)\|_2 \leq \varepsilon_h & \text{(HR fidelity)} \ \| l - (SB \hat{h}_r + s_l)\|_2 \leq \varepsilon_l & \text{(LR fidelity)} \ \|s_h\|_1 \leq \eta_h,\ \|s_l\|_1 \leq \eta_l & \text{(Sparse noise)} \end{array} \right. \end{aligned}$ Here, $h$ and $l$ are observed noisy HR and LR images, $S$ and $B$ are known sampling and blurring operators, $s_h$ , $s_l$ are per-image sparse noise, and $\beta_b$ , $\varepsilon_h$ , $\varepsilon_l$ , $\eta_h$ , $\eta_l$ are user-prescribed bounds.

This design enables flexible, geometry-adaptive control over both regularization strength and spatial structure preservation, with all other constraints orthogonalized for stable parameter tuning.

3. Algorithm Design and Parameter Robustness

The optimization problem is solved using a preconditioned primal-dual splitting method with Operator-Norm-based Variable-wise Diagonal Preconditioning (OVDP). In each iteration, auxiliary splitting variables are updated for each constraint and regularizer, and automatic stepsize selection is employed based on operator norms, mitigating the need for heuristic manual tuning.

During optimization, the adaptive threshold $\alpha$ for TGEC is dynamically updated by: $\alpha^{(n)} \leftarrow c_{\alpha} \cdot \| \mathbf{W} \mathbf{D} x^{(n+1)}\|_q \cdot \frac{\|l - l'\|_1}{N_l}$ where $c_{\alpha}$ is a coefficient, and $l'$ , $x^{(n+1)}$ denote auxiliary low-resolution variables and image estimates, respectively.

The authors provide empirically validated parameter recommendations— $\delta = 0.1$ , $k = 2$ (number of suppressed directions per pixel in TGTV), $c_{\alpha} = 5$ , the mixed $\ell_{1,2}$ -norm for edge constraints, and specific thresholds for brightness and noise fidelity. These parameters yield consistent, robust performance across varied remote sensing datasets and noise settings.

4. Performance Evaluation and Comparisons

Quantitative benchmarks indicate that TSSTF performs on par with or surpasses state-of-the-art spatiotemporal fusion methods under noise-free conditions, and convincingly outperforms them in realistic, noisy environments. Evaluated using metrics such as PSNR (Peak Signal-to-Noise Ratio) and MSSIM (Mean Structural Similarity Index Measure), TSSTF is particularly effective at retaining spatial detail—edges, fine structures, object boundaries—while suppressing noise-induced artifacts and reducing over-smoothing. Visual inspection further confirms superior preservation of geometric fidelity.

A comparison of key attributes in state-of-the-art methods is shown below:

Method	Handles Noise	Edge Preservation	Adaptive Parameters
STARFM, VIPSTF	Limited	Weak	Modest
RobOt, ROSTF	Moderate	Moderate	Procedure-specific
TSSTF	Strong	Strong	Explicit & robust

TSSTF remains robust across a spectrum of regions and sensor sources due to its geometry-driven adaptability.

5. Geometry Awareness and Transferability

TSSTF is geometry-aware in both design and execution. By deriving TGTV weights and TGEC constraints from the guiding HR reference image, the method ensures that fusion output is spatially structured in accordance with real, observable geometry rather than just pixel-level intensity comparisons. Weighted smoothing is selectively relaxed at edges, preventing loss of geometric detail, while temporal edge alignment builds temporal coherence in structure.

The approach—regularizing and constraining via explicit geometry—contrasts with purely pixel-based methods that may blur or misalign spatial structures under noise or spectral shifts. This design is especially suited to remote sensing applications, where geometric fidelity in features like roads, field boundaries, and terrain edges is crucial for downstream analyses.

6. Algorithmic Implications and Future Prospects

The TSSTF architecture, combining adaptive geometry-aware regularization and constraints within a unified convex optimization framework, provides a model for future spatiotemporal fusion research. Its explicit geometric modeling distinguishes it from local-only or non-adaptive approaches and renders it extensible to scenarios involving additional modalities, higher dimensions, and more severe noise degradation.

Notably, the fine-tuning of parameter sets based on structural cues, and automatic stepsize adjustment mechanisms such as OVDP, present a paradigm for designing practically deployable spatiotemporal models that balance high performance with reproducibility.

7. Practical Relevance and Applications

TSSTF is applicable to remote sensing tasks requiring both spatial detail and temporal updating, such as:

Monitoring land use and land cover change,
Tracking agricultural dynamics,
Disaster area assessment,
Urban expansion analysis,
Pre-processing for downstream classification or object detection in fused, multi-date datasets.

Its robust geometry-aware design promotes edge-preserving fusion and adaptation to diverse image types and collection conditions without extensive retuning, making it suitable for operational Earth observation workflows facing complex terrain, variable acquisition schedules, and sensor variability.

TSSTF represents a geometry-aware temporal fusion paradigm in remote sensing, demonstrating that incorporating explicit structural guidance and temporal edge consistency into both regularization and constraint design delivers denoised, detail-rich fused images that are robust to environmental and sensor-induced degradations (Isono et al., 15 Aug 2025).

PDF Markdown Chat (Pro)

References (1)

Temporally-Similar Structure-Aware Spatiotemporal Fusion of Satellite Images (2025)

Follow Topic

Get notified by email when new papers are published related to Geometry-Aware Temporal Fusion Scheme.