Papers
Topics
Authors
Recent
2000 character limit reached

Time-Resolved Transformer (TRT)

Updated 14 October 2025
  • Time-Resolved Transformer (TRT) is a specialized architecture that uses dual spatio-temporal attention to extract both local detail and global context from transient, photon-limited data.
  • It employs parallel self-attention encoders and cross-attention decoders to fuse fine spatial patches with downsampled global features, ensuring accurate 3D reconstruction.
  • TRT enhances photon-efficient imaging by addressing low signal-to-background ratios in both LOS and NLOS settings through dedicated denoising and physics-inspired priors.

The Time-Resolved Transformer (TRT) encompasses a class of transformer architectures specifically adapted for modeling and processing temporal or transient data, particularly in domains where high temporal/spatio-temporal resolution and photon efficiency are paramount. TRT architectures are designed to exploit local and global correlations across time and space, enabling robust 3D reconstruction from time-resolved photon measurements in challenging environments with low quantum efficiency and high noise. The TRT framework extends canonical transformer models through two specialized attention modules—spatio-temporal self-attention encoders and spatio-temporal cross-attention decoders—and provides dedicated implementations for both line-of-sight (LOS) and non-line-of-sight (NLOS) imaging tasks (Li et al., 10 Oct 2025).

1. Architectural Overview

The core TRT architecture is composed of two parallel self-attention encoding branches, each engineered to capture complementary characteristics within transient measurement data, and corresponding cross-attention decoding branches to fuse extracted features for 3D scene reconstruction.

FL=FFN{Wt-MSA{Ws-MSA{FS}}}F_L = \mathrm{FFN}\left\{ W_t\textrm{-}\mathrm{MSA}\left\{ W_s\textrm{-}\mathrm{MSA}\left\{ F_S \right\} \right\} \right\}

  • Global Spatio-Temporal Encoder: Starts with spatial downsampling of FSF_S, applies full spatial MSA (Fₛ-MSA), then full temporal MSA (Fₜ-MSA), followed by FFN refinement:

FG=FFN{Ft-MSA{Fs-MSA{FS}}}F_G = \mathrm{FFN}\left\{ F_t\textrm{-}\mathrm{MSA}\left\{ F_s\textrm{-}\mathrm{MSA}\left\{ F_S^\downarrow \right\} \right\} \right\}

The local encoder preserves fine spatial/temporal continuity, crucial for photon-sparse, high-detail regions. The global encoder provides longer-range scene context, benefitting reconstructions in scenes with complex geometries or extensive occlusions.

2. Spatio-Temporal Cross-Attention Decoders

The fusion of local and global features leverages dual cross-attention decoding branches, utilizing spatio-temporal cross-attention (STCA):

  • Local Feature Decoder:

FL=FFN{STCA[Q=FG,K=V=FL]}F_L^* = \mathrm{FFN}\left\{ \mathrm{STCA}\left[ Q = F_G^\uparrow, K = V = F_L \right] \right\}

Here, FGF_G is upsampled to match the local feature scale; global context modulates local feature refinement.

  • Global Feature Decoder:

FG=FFN{STCA[Q=FL,K=V=FG]}F_G^* = \mathrm{FFN}\left\{ \mathrm{STCA}\left[ Q = F_L, K = V = F_G^\uparrow \right] \right\}

Local continuity guides global representations, maintaining robustness to sparse observations.

The STCA mechanism reshapes tokens to simultaneously apply spatial and temporal attention via matrix multiplications, thus modeling both axes within noisy, high-dimensional transient datasets.

3. Addressing Photon-Efficient Imaging Challenges

Photon-efficient imaging systems acquire highly sparse and noisy time-resolved data due to detector limitations (quantum efficiency, dark counts) and ambient interference. TRT directly tackles these by:

  • Segmenting and patching data for local feature extraction (which mitigates photon sparsity at fine resolution).
  • Employing downsampling and global attention to exploit scene-wide geometric similarities and counteract high noise.
  • Using dual attention designs to integrate both local continuity (e.g., inter-pixel similarity) and global structure (e.g., symmetry or repetitive patterns in the scene), supporting inference even with very low signal-to-background ratios.

By preserving the native spatio-temporal structure, TRT maintains computational tractability for large datasets and enables improved generalization to novel or complex environments.

4. Task-Specific Implementations: TRT-LOS and TRT-NLOS

TRT is instantiated in two principal forms:

  • TRT-LOS: Optimized for line-of-sight imaging, this implementation incorporates a feature extraction module (interlaced/dilated 3D convolutions), followed by TRT blocks, and a fusion module that utilizes pixel-shuffle operations for high-fidelity volumetric recovery. Evaluations on a new synthetic LOS dataset (256×256 resolution, variable SBR) and real measurements demonstrate superiority over filtered back-projection and prior deep learning models in reconstruction accuracy and low-SBR robustness.
  • TRT-NLOS: Tailored for scenes where targets are occluded and reconstruction relies on indirect light paths. It adds a lightweight denoiser (3D convolutions, separable variants) to the front, and enhances feature extraction with physics-inspired priors (frequency-wavenumber migration transforms). Testing on synthetic datasets (ShapeNet-based, “Seen” and “Unseen” splits) and new real-world acquisitions with customized confocal sensors yields state-of-the-art performance in both intensity and depth estimates compared to existing methods (FK, LFE, NLOST).

5. Mathematical Formulations

The following table organizes core mathematical formulas from the TRT framework:

Component Formula Description
Local Encoder FL=FFN{Wt-MSA{Ws-MSA{FS}}}F_L = \mathrm{FFN}\{ W_t\textrm{-}MSA\{ W_s\textrm{-}MSA\{ F_S \}\}\} Patch-based spatio-temporal encoding
Global Encoder FG=FFN{Ft-MSA{Fs-MSA{FS}}}F_G = \mathrm{FFN}\{ F_t\textrm{-}MSA\{ F_s\textrm{-}MSA\{ F_S^\downarrow \}\}\} Downsampled global correlation
Local Cross-Attention Decoder FL=FFN{STCA[Q=FG,K=V=FL]}F_L^* = \mathrm{FFN}\{ STCA[Q=F_G^\uparrow, K=V=F_L] \} Fusing global queries with local keys/values
Global Cross-Attention Decoder FG=FFN{STCA[Q=FL,K=V=FG]}F_G^* = \mathrm{FFN}\{ STCA[Q=F_L, K=V=F_G^\uparrow] \} Local query on global information

These formulae define the precise operations for both feature extraction and fusion, ensuring mathematical transparency.

6. Datasets for Benchmarking and Generalization

Two datasets underpin the experimental evaluation of TRT:

  • Synthetic High-Resolution LOS Dataset: Generated with up to 256×256 spatial resolution, incorporating transient measurements across various SBR regimes. This resource enables rigorous training and comparative testing in photon-limited circumstances.
  • Real-World NLOS Dataset: Acquired using a custom confocal imaging apparatus, including diverse scene types and noise levels. Augments existing public datasets, increasing the diversity and realism for benchmark studies and advancement in transient imaging research.

7. Scientific Impact and Future Directions

TRT’s dual attention mechanism—spatio-temporal self-attention combined with specialized cross-attention fusion—enables robust and accurate reconstruction from sparse, noisy transient data. By outperforming prior methods in both simulated and real imaging, TRT strengthens the foundation for future transient imaging systems in both academic and industrial contexts.

A plausible implication is that further refinements in local/global encoding balance, scalable tokenization strategies, and integration with hardware-aware deployment protocols may extend TRT’s applications beyond LOS/NLOS imaging to other domains where time-resolved measurement is critical (e.g., LIDAR, remote sensing, dynamic medical imaging).

In summary, the Time-Resolved Transformer advances 3D photon-efficient imaging by enabling robust feature extraction and fusion from high-dimensional, low-SBR transient measurements. It provides a flexible and generalizable architecture adaptable to both direct and indirect sensing problems, supported by new benchmark datasets for comprehensive evaluation (Li et al., 10 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Time-Resolved Transformer (TRT).