Papers
Topics
Authors
Recent
2000 character limit reached

MUSSE-Net: Unsupervised Strain Estimation

Updated 26 November 2025
  • The paper introduces MUSSE-Net’s multi-stage architecture, combining a USSE-Net backbone with a residual refinement stage to achieve precise, unsupervised strain estimation in ultrasound elastography.
  • It leverages advanced attention-based feature fusion (CACFF and TCA) and sequential ConvLSTM decoding to enhance temporal consistency and suppress decorrelation noise.
  • Experimental evaluations demonstrate significantly improved SNR, CNR, and sharper lesion boundaries on both simulated and in vivo datasets compared to baseline methods.

MUSSE-Net (Multi-Stage Residual-Aware Unsupervised Strain Estimation Network) is an end-to-end unsupervised deep learning framework specifically developed for robust, consistent strain estimation in quasi-static ultrasound strain elastography (USE). USE is a noninvasive imaging technique for assessing tissue mechanical properties, but its clinical utility remains limited by decorrelation noise, absence of ground truth, and instability of strain estimates across deformation levels. MUSSE-Net addresses these challenges via a multi-stage architecture that leverages advanced attention-based feature fusion and sequential modeling, yielding state-of-the-art results in both simulation and in vivo datasets (Joarder et al., 19 Nov 2025).

1. Network Architecture

MUSSE-Net consists of two principal components: an initial USSE-Net backbone and a subsequent residual refinement stage. The USSE-Net module is a single-stage, multi-stream encoder–decoder design featuring attention mechanisms and sequential decoding to produce dense displacement fields and axial strain maps. The secondary residual stage further refines these estimates by predicting displacement residuals following warping of the post-compression frame.

USSE-Net Backbone

Each USSE-Net stage processes one pre-compression RF frame IpreI_{\mathrm{pre}} and a sequence of TT post-compression RF frames {Ipostt}t=1T\{I^t_{\mathrm{post}}\}_{t=1}^T, with each input tensor of size 1×H×W1 \times H \times W. For each pair (Ipre,Ipostt)(I_{\mathrm{pre}}, I^t_{\mathrm{post}}), the network predicts a two-channel displacement field Dt=(Dxt,Dyt)D^t = (D^t_x, D^t_y), with DytD^t_y representing the axial displacement. Axial strain maps ztz^t are then computed via least-squares differentiation of DytD^t_y.

USSE-Net is structured around:

CACFF Multi-Stream Encoder

At each down-sampling level ll, CACFF deploys three parallel branches:

  • Two unimodal streams, with shared weights, extract motion-specific features from IpreI_{\mathrm{pre}} and IposttI^t_{\mathrm{post}} through four residual down-sampling blocks (each block: two 3×3 convolutions + ReLU + skip).
  • A “mid” stream fuses [Ipre,Ipostt][I_{\mathrm{pre}}, I^t_{\mathrm{post}}] and unimodal features, with CACFF blocks producing contextual, complementary cross-frame feature representations.

Tri-Cross Attention (TCA) Bottleneck

At the coarsest spatial scale, TCA fuses feature maps from the three streams (fpret,L,fpostt,L,fmidt,Lf^{t,L}_{\mathrm{pre}}, f^{t,L}_{\mathrm{post}}, f^{t,L}_{\mathrm{mid}}), projecting each to key, query, and value tensors. TCA computes global pairwise correlations (e.g., Apremid=softmax(QmidKpreT)A_{pre\to mid} = \text{softmax}(Q_{\mathrm{mid}} K_{\mathrm{pre}}^\mathsf{T})) between all pairs, concatenates outputs, projects back to the original dimension, and additively fuses into the mid-stream value—capturing long-range dependencies, reducing decorrelation noise, and suppressing lateral artifact.

CAF Sequential ConvLSTM Decoder

Decoding unfolds in reverse LL levels, with ConvLSTM enforcing temporal consistency across post-compression frames. At each level ll and time tt:

  1. The upsampled hidden state hl1t\uparrow h^t_{l-1} is combined via attention over the encoder’s skip features.
  2. Attention weights α=softmax(Wa[hl1t;skips]+ba)\alpha = \text{softmax}(W_a[\uparrow h^t_{l-1};\,\mathrm{skips}]+b_a) form a fused skip f~lt=αiskipsi\widetilde f^t_l = \sum \alpha_i\,\mathrm{skips}_i.
  3. The concatenated vector is processed through 3×3 conv + ReLU and passed, together with hlt1h^{t-1}_l, to the ConvLSTM, yielding updated hidden state hlth^t_l.

A final 3×3 convolution maps hLtDth^t_L \to D^t. Displacement across all levels is aggregated and differentiated to obtain strain.

Residual Refinement Stage

MUSSE-Net stacks two USSE-Net stages (optimal Mopt=2M_{\mathrm{opt}} = 2). At stage mm, the original IpreI_{\mathrm{pre}} and a warped post-frame Ipostt,m1I^{t,m-1}_{\mathrm{post}} are provided (with Ipostt,0=IposttI^{t,0}_{\mathrm{post}} = I^t_{\mathrm{post}} for m=1m=1). The network predicts a residual displacement Drest,mD^{t,m}_{\mathrm{res}} and computes the refined displacement Dt,m=Dt,m1+Drest,mD^{t,m} = D^{t,m-1} + D^{t,m}_{\mathrm{res}}, updating the warped image accordingly. Warping is preceded by 4× upsampling and followed by downsampling for sub-pixel accuracy. Only the current stage is updated during training; earlier stages are frozen.

2. Unsupervised Loss Functions

MUSSE-Net is trained entirely without ground-truth displacement, leveraging three unsupervised, physically motivated loss terms:

  • Photometric (Similarity) Loss: Based on local normalized cross-correlation (LNCC) between pre-compression and warped post-compression frames.
  • Displacement Smoothness Loss: Penalizes second-order spatial gradients to enforce tissue continuity.
  • Temporal Consistency Loss: Minimizes LNCC difference between consecutive predicted strain maps, stabilizing estimates across deformation.

The total loss is given by: Ltotal=αLsim+βLcon+γLsmoothL_{\mathrm{total}} = \alpha\,L_{\mathrm{sim}} + \beta\,L_{\mathrm{con}} + \gamma\,L_{\mathrm{smooth}} with empirically chosen weights α=1.0\alpha=1.0, β=0.2\beta=0.2, γ=0.3\gamma=0.3. These losses are identically applied in both stages, with each residual stage trained independently.

3. Strain Calculation

Axial strain is computed from the axial displacement field Dy(x,y)D_y(x, y) using a least-squares strain estimator (LSQSE). The estimator solves

z(x,y)=argminz(Dy(x,y)0xz(s,y)ds)2dxz(x, y) = \arg\min_z \int (D_y(x, y) - \int_0^x z(s, y) ds )^2 dx

Discretized, this yields z=(MTM)1MTDyz = (M^\mathsf{T}M)^{-1}M^\mathsf{T}D_y, where MM is a finite-difference integration matrix. The network implements this as a convolutional least-squares layer for efficient strain map generation.

4. Experimental Protocols and Quantitative Evaluation

Datasets

  • Field II simulation: 23 phantoms with inclusions (18–23 kPa) and backgrounds (40–60 kPa) across 10 strain levels (0.5–4.5 %) and 10 scatter realizations. 19 phantoms for training, 4 for validation and test; T=9T=9 post-frames per reference.
  • Public in vivo dataset: 310 sequences, each 19–127 frames; 6 consecutive frames used for sequential decoding; 20 sequences for testing.
  • Private BUET dataset: 23 subjects for training, 5 for testing with 10 MHz probe (40 MHz sampling); includes tissue-mimicking phantom. T=5T=5 post-frames per reference.

Networks trained with Adam optimizer (lr =1×103= 1 \times 10^{-3}), stepwise LR scheduling, batch size 1, 150 epochs (stage 1), plus 100 epochs (residual stage).

Metrics

Metric Formula/Interpretation
Target SNR SNRt=sˉt/σt{\rm SNR}_t = \bar s_t/\sigma_t (mean/std of strain in lesion)
Background SNR SNRbg=sˉb/σb{\rm SNR}_{bg} = \bar s_b/\sigma_b (mean/std of strain in background)
Contrast-to-Noise Ratio (CNR) 2(sˉbsˉt)2/(σb2+σt2)\sqrt{2(\bar s_b-\bar s_t)^2 / (\sigma_b^2+\sigma_t^2)}
Elastographic SNR SNRe=sˉ/σ{\rm SNR}_e = \bar s/\sigma (global mean/std for strain map)
NRMSE 100×1Ni(wiGTwiθ)2/1Ni(wiGT)2100\times \sqrt{\frac{1}{N}\sum_i(w^{\mathrm{GT}}_i - w^\theta_i)^2} / \sqrt{\frac{1}{N}\sum_i(w^{\mathrm{GT}}_i)^2}

Main Results

Model SNRt_t SNRbg_{bg} CNR NRMSE SNRe_e
USENet (baseline) 13.66 ± 1.75 48.15 ± 7.27 20.98 ± 3.62 29.35 ± 0.77 % 4.43 ± 0.35
ReUSENet (ConvLSTM) 14.64 ± 2.23 68.43 ± 18.36 30.33 ± 8.06 2.03 ± 0.04 % 7.50 ± 0.58
USSE-Net 16.69 ± 3.52 102.25 ± 33.36 43.11 ± 14.15 1.12 ± 0.12 % 9.16 ± 0.80
MUSSE-Net 24.54 ± 3.66 132.76 ± 45.63 59.81 ± 20.38 1.31 ± 0.06 % 9.73 ± 1.08

For the public in vivo test set, mean SNRe_e values were 0.81 (USENet), 0.96 (ReUSENet), 0.97 (USSE-Net), and 0.99 (MUSSE-Net). On the private BUET dataset and phantom, MUSSE-Net demonstrated consistently sharper lesion boundaries, higher contrast, and suppressed decorrelation noise relative to previous methods. Stability analyses confirm that MUSSE-Net maintains metrics such as SNRt_t, SNRbg_{bg}, and CNR even at high strain (4.5 %), where earlier models degrade.

5. Clinical Significance and Interpretability

The integration of CACFF (for contextual fusion), TCA (for cross-modal attention), and CAF–ConvLSTM (for temporal coherence) enables MUSSE-Net to generate axial strain maps closely reflecting true tissue mechanics. Lesion areas appear with well-defined, reproducible boundaries and contrast, while background noise and artifacts are significantly diminished. The temporal consistency loss ensures that strain estimates remain stable under varying deformations, a requirement for clinical USE applications such as breast lesion assessment and liver fibrosis staging.

6. Limitations and Future Directions

Despite its superior quantitative and qualitative performance, MUSSE-Net exhibits increased training and inference costs due to its multi-stage construction and use of ConvLSTM cells, and currently requires batch size 1 for training. Future research aims include developing lightweight variants, leveraging network pruning, exploring alternative temporal modeling strategies to permit larger batch sizes, and undertaking broader validation across multi-center clinical datasets. These directions are intended to address scalability and generalizability for widespread clinical adoption (Joarder et al., 19 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to MUSSE-Net.