SDT-Net: Space Debris & Medical Segmentation

Updated 28 January 2026

The paper presents dual SDT-Net architectures that address space debris tracking and medical image segmentation with specialized network designs.
For space debris, SDT-Net leverages a DLA-34 backbone with RoI feature enhancement and offset regression to achieve high MOTA, HOTA, and IDF1 scores in cluttered environments.
In medical segmentation, SDT-Net employs a dual-teacher framework with dynamic switching and hierarchical consistency to generate accurate, anatomically plausible segmentations.

SDT-Net refers to two distinct, state-of-the-art neural frameworks introduced in recent literature: (1) a deep learning-based tracking-by-detection architecture for space debris tracking in optical imagery (Zhuang et al., 3 Jun 2025), and (2) a dual-teacher, single-student network for scribble-supervised medical image segmentation (Nguyen et al., 21 Jan 2026). Both leverage advanced network architectures and loss function engineering to tackle challenging real-world data association and weak supervision scenarios, respectively.

1. SDT-Net for Space Debris Tracking: Architecture and Methodology

SDT-Net adopts a tracking-by-detection paradigm, inspired by CenterTrack, and is architected in three major stages: feature representation using a DLA-34 backbone with RoI Feature Enhancement (RoI-FE), an endpoint+embedding detection head, and an offset-based tracking head. The framework systematically integrates previous and current frame information with heatmap priors to robustly localize and temporally associate linear debris cues in complex skylight backgrounds.

Input Fusion and Backbone:

Each time step $t$ ingests: the current frame $I_t$ , previous frame $I_{t-1}$ , and previous endpoints heatmap $H_{t-1}$ , processed by identical 3×3 convolution modules with batch normalization and ReLU, then summed: $F_{\mathrm{input}} = F_t^{(\mathrm{in})} + F_{t-1}^{(\mathrm{in})} + F_H^{(\mathrm{in})}$ with $C_0=64$ channels. This is passed to a DLA-34 encoder/decoder, producing multi-scale features and ultimately a high-resolution feature map $F_b \in \mathbb{R}^{H \times W \times 64}$ .

RoI-FE Segmentation:

To suppress background clutter, a lightweight segmentation head outputs mask logits $\hat M$ , probability-mapped and elementwise-multiplied with $F_b$ to yield $F_{\mathrm{en}}$ . The segmentation head is supervised by pixelwise binary cross-entropy.

Detection Head:

From $F_{\mathrm{en}}$ , three parallel convolutional heads predict: (i) a two-channel endpoints heatmap $\hat H$ , optimized via focal loss; (ii) left/right embeddings $\hat E_\ell,\,\hat E_r$ , with embedding losses $\mathcal{L}_{\mathrm{same}}$ and $\mathcal{L}_{\mathrm{diff}}$ enforcing within-object cohesion and between-object separation.

Tracking Head:

An offset regression head predicts pixel-level displacements $\hat O_\ell^t,\,\hat O_r^t$ , used for object-to-object similarity, facilitating temporal association via nearest-neighbor matching. The offset loss is $L_1$ between predicted and true displacements.

2. SDT-Net for Scribble-Supervised Medical Segmentation: Framework, Supervision, and Dynamics

SDT-Net in medical segmentation is a dual-teacher/single-student architecture addressing the high ambiguity of sparse scribble supervision (Nguyen et al., 21 Jan 2026). It includes:

One student segmentation network (e.g., UNet), learned with a combination of scribble loss, pseudo-label loss, and feature consistency loss.
Two teacher networks, updated as exponential moving averages (EMA) of the student.
A Dynamic Teacher Switching (DTS) module dynamically selecting the teacher with lower scribble CE loss.
A Pick Reliable Pixels (PRP) mechanism filtering high-confidence teacher predictions for pseudo-labeling.
A Hierarchical Consistency (HiCo) module enforcing multi-level alignment between student and teacher features.

Dynamic Teacher Switching:

For each training batch, the scribble loss for teachers $T_1$ and $T_2$ is computed on annotated pixels, and the better teacher $T^*$ is chosen per-batch. This strategy reduces confirmation bias and guides the student with more reliable pseudo-labels.

PRP and HiCo:

Only pixels where $T^*$ 's softmax confidence exceeds threshold $\tau$ are used for the pseudo-label loss, using a mean of cross-entropy and Dice losses. HiCo enforces $L_1$ and cosine similarity consistency at both low-level and high-level feature maps.

3. Datasets, Simulation, and Evaluation Protocols

Space Debris Tracking (SDTD):

The Space Debris Tracking Dataset (SDTD) underpins the space debris SDT-Net. SDTD is constructed from 16,040 wide-field ZTF backgrounds with synthetic debris generated via a pipeline that models debris trajectories, line widths, brightness, and Gaussian PSF-convolved streaks. SDTD contains 18,040 video sequences (62,562 frames, ≈250,000 synthetic debris), split into training, sparse test, and dense test. Each frame is annotated with exact endpoint coordinates and tracking IDs.

Medical Segmentation:

For SDT-Net in medical segmentation, evaluation uses ACDC (cardiac MRI, 70/15/15 split) and MSCMRseg (25/5/15 split) datasets, with standard scribble annotation protocols.

4. Loss Functions, Optimization, and Training Details

Space Debris SDT-Net:

The overall loss is a weighted sum: $\mathcal{L} = \lambda_{\mathrm{seg}}\mathcal{L}_{\mathrm{seg}} + \lambda_{\mathrm{hm}}\mathcal{L}_{\mathrm{hm}} + \lambda_{\mathrm{emb}}(\mathcal{L}_{\mathrm{same}}+\mathcal{L}_{\mathrm{diff}}) + \lambda_{\mathrm{off}}\mathcal{L}_{\mathrm{off}}$ with empirically-selected hyperparameters ( $\lambda_{\mathrm{seg}}=1.0$ , $\lambda_{\mathrm{hm}}=10.0$ , $\lambda_{\mathrm{emb}}=1.0$ , $\lambda_{\mathrm{off}}=0.1$ ). Implementation uses MMDetection/PyTorch on 8 NVIDIA 4090 GPUs. Training spans 60 epochs, with a learning rate of $3\times10^{-3}$ decayed by 0.1 at epoch 20, and a batch size of 2 on $1524\times1524$ crops. Data augmentation includes intensity jitter and random rotations.

Medical Segmentation SDT-Net:

Supervision is via a joint loss: $\mathcal{L}_{\mathrm{Total}} = \mathcal{L}_{Scribble} + \mathcal{L}_{Pseudo} + \mathcal{L}_{HiCo}$ The selected teacher is EMA-updated with $\alpha=0.999$ . Training uses SGD (lr=0.01, momentum=0.9, weight decay $1\times10^{-4}$ , batch size 8, 30k iterations, and 12k warmup steps) on the respective datasets.

5. Quantitative Performance and Empirical Findings

Space Debris Tracking Results:

SDT-Net attains superior tracking on SDTD:

"Debris Test" (≤2 debris/frame): IDF1 91.8%, MOTA 87.7%, HOTA 87.8%, IDS 169.
"Dense Debris Test" (≥3 debris/frame): IDF1 80.6%, MOTA 70.3%, HOTA 73.6%, IDS 1070.

SDT-Net exceeds CenterTrack and OCSORT by +10.6% and +5.5% MOTA on sparse/dense, respectively. On Antarctic real data (31 videos, 1,895 frames), SDT-Net delivers MOTA 73.2% (CenterTrack: 64.7%, OCSORT: 69.3%). In a user study (5 sequences, 333 frames), MOTA is ≈70.6%.

Qualitative analysis highlights the efficacy of RoI-FE in suppressing confounds (clouds, moon-glow, star flares) and the offset-based tracking head's robustness in dense and partially occluded settings.

Medical Segmentation Results:

On ACDC, SDT-Net achieves mean Dice 90.8% (LV 93.5, MYO 89.3, RV 89.6), outperforming DMPLS, ScribbleVC, ScribFormer, AIL, and HELPNet. On MSCMRseg, mean Dice is 90.0% (LV 93.1, MYO 85.0, RV 88.8). Ablation studies show substantial improvement from joint DTS, PRP, and HiCo; single-teacher plain pseudo-labeling attains only 69.0%. Qualitative assessment demonstrates sharper, anatomically plausible boundaries and preservation of topology.

6. Comparative Analysis and Significance

SDT-Net for both applications advances the capability to reason about objects in low signal-to-noise, weakly supervised, and dynamically changing domains. In space surveillance, the combination of DLA-34 backbone, RoI-FE masking, two-endpoint detection/embedding, and offset-based temporal association enables robust debris tracking in cluttered, high-variability backgrounds, addressing fundamental limitations in traditional signal processing (Zhuang et al., 3 Jun 2025). For scribble-supervised segmentation, dual-teacher architecture mitigates pseudo-label noise and confirmation bias, while hierarchical feature alignment yields anatomically consistent segmentations under sparsity constraints (Nguyen et al., 21 Jan 2026).

A plausible implication is that SDT-Net's architectural and supervision innovations generalize to other domains where sparse or ambiguous evidence must be resolved through both instance-level and feature-level consistency constraints.

7. Future Directions and Applications

For space debris, the imminent release of SDTD and SDT-Net code supports reproducibility and benchmarking for future optical debris tracking research. For medical segmentation, the dual-teacher dynamic and feature alignment mechanisms in SDT-Net suggest wider applicability in weakly- and semi-supervised regimes across modalities and tasks. Both formulations of SDT-Net exemplify principled design in leveraging hybrid detection, tracking, and representation learning for real-world problems, with performance validated on large-scale challenging benchmarks and real-world deployment scenarios (Zhuang et al., 3 Jun 2025, Nguyen et al., 21 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

High Performance Space Debris Tracking in Complex Skylight Backgrounds with a Large-Scale Dataset (2025)

Scribble-Supervised Medical Image Segmentation with Dynamic Teacher Switching and Hierarchical Consistency (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SDT-Net.