Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 471 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

SurLonFormer: Dynamic Survival Prediction

Updated 17 August 2025
  • SurLonFormer is a Transformer-based architecture that integrates longitudinal medical imaging and clinical covariates for dynamic survival prediction.
  • It employs a vision encoder, autoregressive sequence encoder, and a survival encoder inspired by Cox proportional hazards to address censoring, temporal correlations, and interpretability issues.
  • Empirical evaluations in simulations and Alzheimer’s disease applications demonstrate its superior predictive performance and effective spatial biomarker identification.

SurLonFormer is a Transformer-based neural architecture developed for dynamic survival prediction using longitudinal medical imaging alongside structured clinical covariates. The architecture addresses key limitations in existing survival models, notably the suboptimal exploitation of censored data, neglect of temporal correlations among serial images, and limited interpretability. SurLonFormer integrates a vision transformer for patch-wise image feature extraction, an autoregressive transformer for longitudinal sequence modeling, and a Cox proportional hazards-inspired neural network for risk estimation, thereby enabling flexible and interpretable dynamic predictions in high-dimensional and temporally-evolving medical datasets.

1. Architectural Components

SurLonFormer is composed of three principal modules designed for hierarchical representation and risk modeling:

  • Vision Encoder: Processes individual medical images (e.g., MRI scans) by partitioning each into PP equal-sized patches, flattening, and projecting each patch into a dd-dimensional embedding space. A learnable CLS token (CLS_v) is appended, and positional encodings are incorporated, yielding an input of shape (P+1)×d(P+1)\times d per image. This sequence is propagated through NvN_v self-attention transformer encoder layers with multi-head attention, residual connections, and normalization. The post-transformer embedding of the CLS_v token serves as the image-level representation v(tij)v(t_{ij}) for visit jj of patient ii:

v(tij)=fv(tij)v(t_{ij}) = f_v\big(t_{ij}\big)

  • Sequence Encoder: Receives the temporal sequence of image embeddings for each patient for all visits conducted up to landmark time tt^*. With the sequence [v(ti0),,v(tiJ1)][v(t_{i0}), \dots, v(t_{iJ^*-1})] plus a learnable CLS token (CLS_l), the model forms a sequence of length J+1J^*+1 input to an autoregressive (NlN_l layers) transformer encoder with causal masking. Causal masking restricts attention to historical and current timepoints, preserving temporal order and preventing information leakage. The CLS_l embedding output is denoted li(t)l_i(t), summarizing the patient's longitudinal progression:

li(t)=fl([v(ti0),,v(tiJ1),CLSl])l_i(t) = f_l\left([v(t_{i0}), \dots, v(t_{iJ^*-1}), CLS_l]\right)

  • Survival Encoder: Fuses the learned longitudinal sequence embedding li(t)l_i(t) with scalar covariates xix_i (e.g., demographic or clinical features) via a one-hidden-layer feed-forward neural network (FFNN) with GELU activation. The output is a patient-specific risk score rir_i, parameterized as:

ri=fs(li(t),xi)=GELU([li(t)T,xiT]Ws,1+bs,1)Ws,2+bs,2r_i = f_s(l_i(t), x_i) = \text{GELU}\left( [l_i(t)^T, x_i^T] W_{s,1} + b_{s,1} \right) W_{s,2} + b_{s,2}

2. Cox Proportional Hazards Integration

SurLonFormer implements dynamic survival prediction by embedding the Cox proportional hazards model within a neural framework:

  • Hazard and Survival Functions: For subject ii, the hazard rate at time tt is:

hi(t)=h0(t)exp{ri}h_i(t) = h_0(t) \exp\{ r_i \}

Here, h0(t)h_0(t) is the nonparametric baseline hazard, while rir_i is the learned risk score replacing the linear predictor in classical Cox models. The survival function is:

S(t)=exp{H0(t)expri}S(t) = \exp\{ -H_0(t) \exp{ r_i } \}

with H0(t)H_0(t) as the cumulative baseline hazard. Model optimization proceeds via maximization of the Cox partial likelihood:

logL=i=1Iδi[rilog(kRiexp{rk})]\log L = \sum_{i=1}^{I} \delta_i \left[ r_i - \log \left( \sum_{k \in R_i} \exp\{ r_k \} \right) \right]

where δi\delta_i denotes event (1) or censoring (0) and RiR_i is the risk set at event time ii. Elastic Net regularization is included to counteract overfitting, especially under limited data regimes.

3. Handling of Censoring, Scalability, and Interpretability

  • Censoring: SurLonFormer directly accommodates censored data using the Cox partial likelihood, leveraging risk sets determined at each failure time to ensure both uncensored and censored observations contribute appropriately to the likelihood term without explicit modeling of h0(t)h_0(t).
  • Scalability: Computational complexity for the vision and sequence encoders is dominated by self-attention operations:

O(H×(P+1)2×d) per vision layer, and O(H×(J+1)2×d) per sequence layerO(H \times (P+1)^2 \times d) \text{ per vision layer, and } O(H \times (J^*+1)^2 \times d) \text{ per sequence layer}

where HH is the number of attention heads, Nv,NlN_v, N_l are depth of the transformer blocks, and dd is embedding dimension. Model expressivity and computational budget are balanced via these hyperparameters.

  • Interpretability — Occlusion Sensitivity: To elucidate which imaging regions most strongly drive risk estimates, SurLonFormer employs occlusion sensitivity analysis. Each image is divided into non-overlapping regions. Sequentially masking (with a baseline value, e.g., black patch) each region and measuring the change in the predicted risk absolute score quantifies region influence. Sensitivity maps overlayed on original images reveal spatial areas critical for the model's predictive decisions, enabling insight into disease biomarkers.

4. Empirical Evaluation and Results

  • Simulation Studies: Experiments on synthetic longitudinal imaging data characterized by non-smooth, spatially global features employed Frobenius inner product-based ground-truth risk scores and Cox-derived survival times. Benchmarked against FPCA-Cox, LoFPCA-Cox, and CNN-LSTM baselines, SurLonFormer demonstrated higher time-dependent AUC and C-index as well as lower Brier Scores, evidencing enhanced discrimination and calibration.
  • Alzheimer’s Disease Application (ADNI): Evaluation on the ADNI dataset, involving MRI-based longitudinal tracking towards Alzheimer’s onset, further substantiated SurLonFormer’s performance. It surpassed FPCA-Cox, LoFPCA-Cox, and CNN-LSTM in AUC, C-index, and Brier Score. Occlusion analysis corroborated the model’s emphasis on brain regions (frontal and temporal lobes) associated with Alzheimer’s pathology. Dynamic survival prediction allowed recalibration of event probabilities over multiple landmark times (e.g., 12, 24, 48 months), demonstrating model utility in updating individualized prognosis as patients progress.

5. Model Significance and Clinical Implications

SurLonFormer uniquely combines spatial feature extraction, temporal sequence modeling, and flexible risk estimation for survival analysis of high-dimensional medical data. By enabling joint learning with both imaging and structured data, it improves representation learning for complex disease trajectories. The incorporation of occlusion sensitivity facilitates model interpretability, critical for clinical trust and for the identification of disease-relevant biomarkers. Scalability features and principled handling of censored data suggest broad applicability in large-cohort, high-dimensional studies.

A plausible implication is extension to other modalities (e.g., 3D imaging, multimodal fusion) and broader disease contexts, given SurLonFormer’s architecture generalizes across longitudinal, high-dimensional prognostic tasks.

6. Conclusion

SurLonFormer constitutes a significant methodological advance for survival analysis using longitudinal imaging. Through its integrated transformer-based architecture, adherence to the Cox model’s statistical foundations, and robust interpretability mechanisms, it achieves state-of-the-art predictive performance and spatial biomarker identification in both simulation and real-world clinical settings. Its design and results underscore the growing capacity of deep learning models to address the complexities of censored, multi-visit, high-dimensional medical data in dynamic clinical risk modeling (Liu et al., 12 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)