Papers
Topics
Authors
Recent
2000 character limit reached

FreqDINO: Frequency-Guided Ultrasound Segmentation

Updated 19 December 2025
  • FreqDINO is a frequency-guided segmentation framework that enhances boundary localization in ultrasound images using advanced transformer representations.
  • The system integrates multi-scale frequency extraction and boundary-guided feature refinement to mitigate speckle noise and improve structural accuracy.
  • Quantitative results reveal improvements in Dice scores and reduced Hausdorff Distance, validating its effectiveness over baseline models.

FreqDINO is a frequency-guided segmentation framework designed for generalized, boundary-aware ultrasound image segmentation, combining state-of-the-art visual transformer representations with explicitly frequency-driven mechanisms to enhance boundary localization and structural accuracy in challenging medical imaging scenarios. The method addresses modality-specific degradation, notably speckle noise and boundary artifacts that impair performance when using vision transformers pretrained on natural images. Central to FreqDINO is the integration of multi-scale frequency extraction, boundary feature alignment, and frequency-guided boundary refinement within a unified deep learning architecture (Zhang et al., 12 Dec 2025).

1. Architectural Foundations and Motivations

FreqDINO builds upon the DINOv3 visual transformer, leveraging its strong feature extraction abilities but introducing domain-specific enhancements to improve sensitivity to ultrasound-specific boundary challenges. The motivation is predicated on the observation that models pre-trained on natural images lack effective mechanisms to distinguish high-frequency boundary details from modality-specific noise, resulting in smoothed or imprecise segmentation borders. FreqDINO introduces frequency-guided modules—specifically, Multi-scale Frequency Extraction and Alignment (MFEA), the Frequency-Guided Boundary Refinement (FGBR) module, and a Multi-task Boundary-Guided Decoder (MBGD)—to explicitly enhance boundary perception and enforce structural consistency in the final segmentation (Zhang et al., 12 Dec 2025).

2. Multi-scale Frequency Extraction and Alignment (MFEA)

The MFEA component separates the backbone spatial features into low-frequency structure and multi-scale high-frequency boundary representations to enable frequency-disentangled processing. The process is initiated by applying a Haar wavelet transform to the spatial feature map Fspatial\mathcal{F}_{\rm spatial} produced by the DINOv3 encoder and adapters. The wavelet decomposition produces four subbands: FLL\mathcal{F}_{LL}, FLH\mathcal{F}_{LH}, FHL\mathcal{F}_{HL}, and FHH\mathcal{F}_{HH}. Fine-scale boundary features FHf\mathcal{F}_{H_f} are obtained by concatenating (LH,HL,HH)(LH, HL, HH) and reducing with a 1×11 \times 1 convolution ϕH\phi_H, while coarse-scale features FHc\mathcal{F}_{H_c} are generated via further down-up sampling, another Haar transform, and a similar reduction procedure. Both FHf\mathcal{F}_{H_f} and FHc\mathcal{F}_{H_c} are tensors with shape B×C′×H1×W1B \times C' \times H_1 \times W_1, where BB is the batch size, C′C' the channel width, and (H1,W1)(H_1, W_1) the spatial resolution (Zhang et al., 12 Dec 2025).

3. Frequency-Guided Boundary Refinement (FGBR) Module

At the core of FreqDINO is the FGBR module, which exploits frequency-extracted features to enforce boundary sensitivity:

  1. Boundary Prototype Extraction: The two high-frequency maps (FHf,FHc)(\mathcal{F}_{H_f}, \mathcal{F}_{H_c}) are concatenated across channels to form FH∈RB×2C′×H1×W1\mathcal{F}_H \in \mathbb{R}^{B \times 2C' \times H_1 \times W_1}. A stack of two 1×11 \times 1 convolutional layers with ReLU activations is applied (first mapping 2C′→D12C' \to D_1 channels, then D1→64D_1 \to 64), followed by global average pooling across spatial dimensions, yielding a batch of 64-dimensional boundary prototypes P∈RB×64\mathbf{P} \in \mathbb{R}^{B \times 64}.
  2. Boundary-Guided Feature Refinement: Enhanced spatial features Fenh\mathcal{F}_{\rm enh} (from MFEA) are reshaped for attention as Q∈RB×(H1W1)×CQ \in \mathbb{R}^{B \times (H_1 W_1) \times C} and projected to query vectors. The boundary prototype is linearly projected to obtain key/value tensors for an 8-head scaled dot-product attention. The attention output is reshaped and added (with residual scale ω=0.2\omega = 0.2) back to Fenh\mathcal{F}_{\rm enh}, forming Frefined\mathcal{F}_{\rm refined} (Zhang et al., 12 Dec 2025).

The FGBR module thus fuses frequency-derived boundary statistics with spatial detail, directly influencing learned segmentation boundaries.

4. Multi-Task Boundary-Guided Decoder (MBGD) and Integrated Pipeline

Frefined\mathcal{F}_{\rm refined} enters the MBGD, which upscales features and computes both semantic segmentation masks and explicit boundary maps:

  • The decoder applies four transposed convolution upsampling ("UpBlocks") to produce a high-resolution Fshared\mathcal{F}_{\rm shared}.
  • A 1×11 \times 1 convolution produces preliminary boundary logits Mboundary\mathcal{M}_{\rm boundary}, transformed into a soft mask via sigmoid and refined with a 3×33 \times 3 convolution for the final boundary output.
  • The semantic mask head takes as input the concatenation of Fshared\mathcal{F}_{\rm shared} and the boundary prediction, followed by a 1×11 \times 1 convolution.

The pipeline sequence is: Input → DINOv3 encoder → MFEA → FGBR → MBGD → semantic & boundary predictions (Zhang et al., 12 Dec 2025).

5. Quantitative Performance and Ablation

Experimental results underscore the contribution of the FGBR module within FreqDINO. On ultrasound segmentation benchmarks:

  • The base DINOv3 + adapters yields Dice = 82.35%, HD = 47.59 mm.
  • Adding MFEA alone improves to Dice = 84.17%, HD = 44.59 mm.
  • Adding FGBR atop MFEA further yields Dice = 85.13% (+0.96), HD = 43.02 mm (–1.57 mm).
  • The full FreqDINO (MFEA + FGBR + MBGD) records Dice = 86.52%, HD = 39.63 mm.

This demonstrates that FGBR provides a measurable boost in boundary accuracy and overall segmentation agreement relative to frequency feature extraction alone (Zhang et al., 12 Dec 2025).

FreqDINO's FGBR concept is related to the Frequency-Guided Boundary Refinement mechanisms appearing across scientific domains, with notable analogs:

  • In axisymmetric droplet simulations, a signal processing approach uses Fourier-domain envelope analysis of curvature to guide mesh refinement, delivering robust and parametric grid adaptation for capturing singularity formation (Koga, 2019).
  • Temporal action detection in video leverages frequency decoupling to suppress low-frequency background and amplifies atomic (high-frequency) segment boundaries, with analogous modules for frequency-guided action boundary localization (Zhu et al., 1 Apr 2025).

A plausible implication is that the frequency-guided signal processing paradigm is establishing a methodological connection between computational physics, video understanding, and medical image analysis, where boundary localization under noise and class imbalance is critical.

7. Implementation Considerations and Reproducibility

Implementation of FreqDINO's FGBR should adhere to the specifications described: minimal prototype extractor (two 1×11 \times 1 convolutions, ReLU), standard multi-head attention with 8 heads and $128$-dimension per head, and lightweight residual integration with ω=0.2\omega=0.2. The architecture relies on standard PyTorch MultiheadAttention primitives and basic convolutional units. The code for FreqDINO is available at https://github.com/MingLang-FD/FreqDINO (Zhang et al., 12 Dec 2025).

In summary, FreqDINO combines frequency decomposition, boundary prototype learning, and attention-driven feature refinement to deliver state-of-the-art segmentation, particularly excelling in boundary-sensitive, high-noise imaging contexts characteristic of ultrasound.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to FreqDINO.