LesiOnTime: Temporal Lesion Analysis in Medical Imaging

Updated 3 July 2026

LesiOnTime is a temporal lesion analysis framework that integrates spatial and clinical data from multiple timepoints to ensure robust segmentation and tracking.
It employs architectures like dual-encoder U-Nets, temporal attention blocks, and transformer-based fusion to handle variability in serial imaging studies.
By reducing segmentation errors from registration misalignments, LesiOnTime enhances clinical workflow efficiency and supports accurate disease monitoring.

LesiOnTime refers to a class of computational methods and architectures designed to segment, track, or quantify lesions in medical imaging scans with explicit temporal awareness, such that the analysis of lesions across serial studies (“on time”) is robust, reliable, and suitable for clinical workflows, including real-time use. The LesiOnTime paradigm contrasts with traditional cascaded single-timepoint approaches by integrating temporal modeling at the algorithmic or architectural level, directly leveraging spatial and clinical information spanning multiple timepoints to optimize lesion segmentation or detection fidelity, correspondence stability, and biologically meaningful temporal consistency. This entry synthesizes technical innovations, architectures, evaluation findings, and ongoing limitations for LesiOnTime systems, drawing on their use in oncology, neurology, and broader computational medical imaging.

1. Rationale for Temporal Integration in Lesion Analysis

Lesion analysis in longitudinal imaging is foundational to disease monitoring, particularly in oncology and neurology. Manual lesion measurement, matching, and quantification are laborious and subject to high inter- and intra-observer variability. Automated tools using single-timepoint models, such as nnU-Net-based segmenters, fail to account for temporal shifts and introduce critical failure cascades: segmentation quality drops post-registration, and lesion correspondence becomes unreliable due to compounded errors. For example, in longitudinal melanoma CT studies, baseline segmentation median Dice is 0.83, while median Dice at follow-up drops to 0.62, with 30% of lesions registering DSC < 0.2 after misregistration-induced centroid displacement (Rocholl et al., 25 Jul 2025).

A central limitation is the assumption that the lesion center is always accurately known and remains at the geometric center of the input; minor registration errors between scans often exceed 10 mm (>20% of lesions), degrading accuracy sharply. These observations motivate a shift to integrated, temporally-aware modeling where segmentation and correspondence are mutually informed by the temporal sequence, and architectural priors explicitly encode changes over time (Rocholl et al., 25 Jul 2025, Kamran et al., 1 Aug 2025, Rokuss et al., 2024).

2. Core Architectures and Temporal Fusion Blocks

LesiOnTime systems implement several recurring architectural primitives designed for robust temporal integration:

Dual-Encoder U-Nets: Networks ingest and process current and prior (or multiple) timepoint volumes via paired or shared encoders. Temporal feature fusion typically occurs at corresponding stages in the feature hierarchy (Kamran et al., 1 Aug 2025, Rokuss et al., 2024).
Temporal Prior Attention (TPA) Block: At every skip connection, features from prior and current scans are concatenated, pooled, and weighted via an Attention Weight Generator (AWG). Weighted residuals, instance-normalized, are added back, allowing the network to modulate its reliance on historical context and focus on emerging or regressing lesions. The formula is:

$\tilde k_t^m = k_t^m \odot \mathrm{InstNorm}(w_1\,k_t^m - w_2\,k_{t-1}^m) + k_t^m$

where $k_t^m$ and $k_{t-1}^m$ are feature maps, $w_1, w_2$ are attention weights (Kamran et al., 1 Aug 2025).

Difference Weighting Block: Incorporated into skip connections, this block computes the instance-normalized difference between current and prior features and applies it as a multiplicative attention mask, followed by a residual addition:

$x_c' = x_c + x_c \odot \mathrm{InstNorm}(x_c - x_p)$

This mechanism yields statistically significant improvements in both volume-based (Dice, HD95) and lesion-level (F₁) metrics, directing network attention to evolving lesion activity (Rokuss et al., 2024).

Transformer-Based Fusion: The Transformer Lesion Tracker (TLT) utilizes cross-attention between sparse, lesion-centric template features and full search volume representations, with anatomical alignment imposed by an explicit registration-based attention mask (Tang et al., 2022). Cross-attention operates as:

$\mathrm{Attention}_i(Q, K, V) = \mathrm{softmax}\left(\frac{QW_i^Q (K W_i^K)^\top}{\sqrt{d_k}} + M_A\right)(V W_i^V)$

where $M_A$ is an anatomical bias derived from affine registration alignment.

Stochastic Priors in Generative Models: TimeLesSeg synthesizes prior lesion masks through morphological stochastic processes (including erosion, dilation, or dropping), covering plausible biological progression scenarios without requiring real paired data (Caselles-Ballester et al., 8 May 2026).

3. Loss Functions and Clinical Prior Integration

Temporal consistency and clinical priors are reinforced in LesiOnTime frameworks through novel loss formulations:

BI-RADS Consistency Regularization (BCR) Loss: For breast DCE-MRI, latent feature representations at each encoder layer are regularized according to corresponding BI-RADS scores at $t-1$ and $t$ . If the radiologist’s clinical assessment remains unchanged, features are forced closer in latent space:

$\mathcal{L}_{\mathrm{feat}^{(m)}} = \frac{\tanh(\|\,k_t^{(m)} - k_{t-1}^{(m)}\|_2^2)}{|b_t - b_{t-1}| + \epsilon}$

with overall BCR loss aggregated across layers. This links imaging progression to clinical annotations, embedding domain knowledge and constraining model drift (Kamran et al., 1 Aug 2025).

Traditional and Soft Combination Losses: Most models minimize hybrid objectives combining Dice similarity loss and cross-entropy, sometimes with deep supervision at multiple decoder stages (Caselles-Ballester et al., 8 May 2026, Rokuss et al., 2024).
Self-Supervised Learning (SSL) in Tracking: Deep Lesion Tracker includes SSL by pairing original and augmented volumes with consistent lesion centers, blending supervised focal center loss and SSL with a tunable switch, improving robustness and generalizability (Cai et al., 2020).

4. Datasets, Evaluation, and Quantitative Performance

LesiOnTime benchmarks leverage curated and public datasets that include paired or longitudinal imaging series. Examples include:

Model/System	Primary Modality	Timepoints Used	Temporal Fusion Mechanism	Application Domain
LesiOnTime (Kamran et al., 1 Aug 2025)	Breast DCE-MRI	Prior & current	TPA + BCR loss	Small lesion segmentation
TLT (Tang et al., 2022)	CT (DeepLesion)	Template & follow-up	Sparse cross-attention, registration priors	Lesion center tracking
TimeLesSeg (Caselles-Ballester et al., 8 May 2026)	MRI (MS datasets)	Optional/sim.	Prior mask as stochastic morph. input	Unified cross/longitudinal MS seg.
ULS23 Baseline (Rocholl et al., 25 Jul 2025)	CT (melanoma)	Single	nnU-Net, single-timepoint input	Universal lesion segmentation

Quantitative evaluation is multifaceted, involving:

Segmentation Overlap (Dice, Precision, Recall): LesiOnTime achieves Dice = 0.35 (HD-95 = 106.5 vox) versus 0.29–0.30 for single-timepoint baselines and prior longitudinal MS models (LongiSeg). The effect is more pronounced for small lesions and improved boundary conformity (Kamran et al., 1 Aug 2025).
Longitudinal Correspondence Accuracy: TLT yields CPM@10 mm = 87.37% and Median Euclidean Distance = 6.0 mm, a 14.3% reduction over earlier methods (Tang et al., 2022).
Robustness to Registration Errors: DLT demonstrates CPM@Radius = 88.4% and is far less sensitive to centroid jitter than either registration or anchor-based baselines (Cai et al., 2020).
Temporal Dynamics/Lesion Load Tracking: TimeLesSeg achieves median DSC = 0.60 for MS lesion load tracking (vs. 0.50 for SAMSEG), with lower HD_95 and narrower Bland–Altman limits (Caselles-Ballester et al., 8 May 2026).
Ablation Evidence: TPA and BCR provide complementary performance gains (Dice drops –0.04 and –0.03, respectively, upon ablation) (Kamran et al., 1 Aug 2025). The Difference Weighting block outperforms both channel-concatenation and single-timestep nnU-Net by a margin significant at p < 0.05 (Rokuss et al., 2024).

5. Failure Modes, Limitations, and Methodological Comparisons

Cascaded, single-purpose segmenters (e.g., ULS23, U-Net) manifest two characteristic LesiOnTime failure modes:

Segmentation Degradation due to Input Miscentering: Performance collapses beyond 20 mm centroid shifts (DSC drops from >0.8 to ~0), and correct lesion assignment falls below 50% at 10 mm shift (Rocholl et al., 25 Jul 2025).
Compound Error Cascade: Chaining registration, segmentation, and centroid-proximity matching compounds small errors into clinically unacceptable tracking failures (incorrect assignments plus false negatives ≥ 30%) (Rocholl et al., 25 Jul 2025).

Integrated LesiOnTime designs mitigate these by:

Dropping hard input-centering assumptions,
Jointly learning registration, segmentation, and matching,
Incorporating explicit temporal priors so that inter-scan misalignments do not catastrophically impair performance (Rocholl et al., 25 Jul 2025, Kamran et al., 1 Aug 2025, Tang et al., 2022, Rokuss et al., 2024).

Notably, SEENet offers real-time RECIST measurement and segmentation from a single user click, but is limited to 2D slices and is not fully longitudinal in design (Tang et al., 2020). Pure anchor-free designs are advantageous for detection speed and generalization but do not solve the temporal consistency problem in longitudinal tracking (Zhang et al., 2019).

6. Clinical and Computational Implications

LesiOnTime approaches are directly motivated by clinical diagnostic workflows, where radiologists routinely compare current and prior scans and incorporate ordinal or qualitative scores such as BI-RADS (Kamran et al., 1 Aug 2025). Embedding such context yields substantial recall gains for subtle, early-stage, or regressing lesions, sharper boundary conformities, and improved temporal coherence in lesion quantification. End-to-end tumor monitoring is achievable with LesiOnTime trackers integrated into RECIST measurement pipelines, with lesion growth and response accuracies within 0.5% of human experts (Cai et al., 2020).

A plausible implication is that explicit encoding of clinical heuristics and temporal continuity may reduce both model drift and proneness to spurious temporal artifacts, particularly when combined with data-driven or generative augmentation strategies to cover the full spectrum of plausible lesion trajectories.

7. Open Challenges and Future Directions

LesiOnTime research highlights several open issues:

Generalizability: Most studies are single-center and modality-specific; external, multi-vendor/center validation remains nascent (Kamran et al., 1 Aug 2025).
Temporal Context Extension: Most models use two timepoints; handling variable-length series and explicit modeling of lesion birth/death across series is underexplored.
Uncertainty and Ambiguity: Formal incorporation of ordinal clinical labels, soft labeling, and multi-reader consensus awaits more systematic treatment.
3D and Multimodal Integration: While several approaches are fully volumetric, many segmentation and RECIST pipelines still operate on 2D slices, losing critical context for complex or infiltrative lesions (Tang et al., 2020).

Future work will likely extend LesiOnTime frameworks using transformer-style temporal models, unified architectures handling both cross-sectional and longitudinal tasks (e.g., TimeLesSeg (Caselles-Ballester et al., 8 May 2026)), and explicit uncertainty quantification. Clinical adoption will be contingent upon rigorous external validation, harmonization of pretrained models for new scanners/protocols, and further optimization for low-latency, interactive deployment.