Temporal Embedding for Wound Healing

Updated 8 February 2026

The paper presents self-supervised methods that learn the temporal dynamics of wound healing, enabling label-free staging and forecasting of healing trajectories.
It employs diverse frameworks like Siamese networks, masked autoencoders, and NODEs to extract embeddings that align with biologically defined healing stages.
Key evaluations show high pretext accuracy and robust clustering, demonstrating improvements over conventional supervised approaches in clinical image analysis.

Self-supervised temporal embedding for wound healing progression refers to a class of methods and frameworks that learn to represent the dynamic state of wounds over time from image sequences, without requiring explicit temporal supervision or human-annotated labels. These methods leverage intrinsic temporal structure—such as the biological progression of tissue repair—to derive embeddings that are predictive of wound state, forecast healing trajectories, and enable automated, label-free staging, monitoring, and prognosis.

1. Principles of Self-Supervised Temporal Embedding in Wound Healing

Self-supervised temporal embedding approaches encode the temporal dynamics of wound healing by designing pretext tasks that require the model to reason about the order, progression, and transformation present in longitudinal wound images. These tasks typically exploit the predictable sequence of wound healing stages (hemostasis, inflammation, proliferation, maturation) and the known physical progression towards closure. The underlying objective is to learn representations that reflect not just visual appearance, but also the image's position within a healing sequence and the likely biological state.

Key design principles include:

Use of Siamese or tuple networks to model temporal relationships between image pairs or sequences.
Construction of pretext tasks based on temporal directionality, such as binary classification of forward versus backward ordering.
Clustering of learned embeddings to discover or validate canonical healing stages.
Fine-tuning or probing the temporal embeddings for downstream classification or forecasting tasks.

These approaches have addressed critical challenges such as label scarcity, heterogeneity of healing rates, and irregularly sampled time points (Carrión et al., 2022, Shen et al., 2024).

2. Core Methodologies

Multiple methodological paradigms have been explored for self-supervised temporal embedding of wound healing:

HealNet: Temporal Coherence via Siamese Networks

The "HealNet" framework (Carrión et al., 2022) exemplifies a biologically inspired, three-stage process:

Temporal Coherency Pretext: A Siamese DenseNet-121 encoder (initialized with ImageNet weights) maps each wound image $x$ to a 16D embedding $f(x)$ . Paired images from the same wound are labeled as "forward" ( $j > i$ ) or "backward" ( $j < i$ ) based on their day-index, forming a binary classification task over whether the temporal order is correct. The loss is binary cross-entropy:

$\mathcal{L}_{\rm BCE} = -\frac{1}{N} \sum_{n=1}^N [ y_n \log \hat{y}_n + (1-y_n)\log(1-\hat{y}_n) ].$

Clustering: K-means ( $k=4$ ; hemostasis, inflammation, proliferation, maturation) groups the single-image embeddings to define pseudo-labels for each canonical healing stage.
Fine-Tuning: The encoder is fine-tuned (or a shallow head added) for 4-way classification using the pseudo-labels, with cross-entropy loss.

Masked Temporal Autoencoding and Stochastic Modeling

The STAMP ("Stochastic Temporal Autoencoder with Masked Pretraining") approach (Emre et al., 29 Dec 2025) introduces temporal embedding as a conditional variational inference problem:

Siamese Masked Autoencoder (SiamMAE): Input pairs consist of "past" ( $x_t$ ) and "future" ( $x_{t+\Delta t}$ ) images. The future image undergoes patch masking, and both images are encoded with a Transformer backbone.
Temporal Encoding: The relative timeframe $\Delta t$ is represented via a sin-cos embedding passed through a 2-layer MLP and added to the CLS token, making both the prior and posterior networks $\Delta t$ -aware.
Stochastic Latent: A categorical latent variable $z$ (e.g., 32D one-hot, straight-through Gumbel-softmax) captures the uncertainty in future progression.
Loss Function: The objective is the negative conditional variational autoencoder (CVAE) ELBO, with terms for masked patch reconstruction and KL divergence between the stochastic posterior $q_\phi$ and prior $p_\psi$ :

$L = L_{rec} + \beta\, D_{KL}\bigl[q_\phi(z|x_t, \tilde x_{t+\Delta t}) \,\Vert\, p_\psi(z|x_t, \Delta t)\bigr].$

Temporal Transformers and Irregular Time Embeddings

Temporal Transformer-based methods (Shen et al., 2024) integrate explicit time offsets into sequence models:

Time Embedding: For each image $x_i$ acquired at time $t_i$ , construct a vector $\tau(\Delta t_i)$ via sinusoidal encoding + MLP, yielding frame tokens $\widetilde z_i = z_i + \tau(\Delta t_i)$ .
Clip-level and Frame-level Pretext: Contrastive learning over temporally-consistent augmented clips (NT-Xent loss) and masked frame recovery (cosine similarity loss) are combined, with a weighting $\lambda$ .
Loss: The total temporally-variant representation loss is

$L_{\rm TVRL} = (1-\lambda)\,L^C + \lambda\,L^M.$

Handling Irregular Visits: The $\Delta t$ encoding accommodates non-uniform imaging intervals, common in real-world wound monitoring.

Neural ODE-based Latent Dynamics

The LSSL + NODE framework (Zeghlache et al., 2023) extends Siamese-like self-supervised learning by modeling embedding dynamics as continuous flows:

Siamese or Autoencoder Baseline: Embeddings $z_t = E(x_t)$ ; enforce directionality or direct prediction between timepoints.
Neural ODE Dynamics: Embed time evolution as

$\frac{d z(t)}{dt} = f_\theta(z(t))$

with $z_{t+\Delta t}^{pred} = ODESolve(z_t, f_\theta, t, t+\Delta t)$ .

NODE Loss: Minimize

$L_{NODE} = \mathbb{E}_{(x_t,x_{t+\Delta t})}\Bigl[\| E(x_{t+\Delta t}) - z_{t+\Delta t}^{pred} \|^2 \Bigr].$

3. Data Handling and Preprocessing

Self-supervised temporal embedding frameworks are distinguished by their robustness to small dataset sizes, missing labels, and practical image artifacts.

Data Input: Studies report 256 wound images (8 mice, 2 wounds each, 16 days) as the basis for embedding training (Carrión et al., 2022).
Spatial Preprocessing: Images are circularly cropped around the wound centroid, background is masked, and resized to the encoder's input size (224×224 or 256×256).
Augmentation: Standard procedures include random horizontal/vertical flip, color jitter, and Gaussian blur to simulate acquisition and environmental variability.
Temporal Sampling: Pairwise (all possible $i,j,\,i \ne j$ ) or clip-based (fixed or variable-length subsequences) sampling is performed to construct positive and negative training samples or sequence inputs. For irregularly timed images, explicit encoding of continuous $\Delta t$ or log-compressed $\Delta t$ is applied (Shen et al., 2024).

4. Evaluation Metrics and Results

Evaluation encompasses both pretext and downstream tasks:

Metric	HealNet Result	Baseline (Direct Supervised)
Pretext temporal coherence accuracy	97.7% (test)	—
Heal-stage classification accuracy	90.6% (test)	78.1% (test)

Pretext Task: Binary classification of order direction achieves near-perfect accuracy (97.7%) on held-out test data, validating that learned embeddings encode monotonic healing progress (Carrión et al., 2022).
Clustering Validation: K-means on 16D embeddings yields four clusters with PCA visualizations supporting their biological plausibility; pseudo-labels show ≈ 80.5% agreement with naive human raters.
Downstream Classification: With HealNet, 4-way stage classification (using pseudo-labels) surpasses supervised DenseNet-121 (no pretext), affirming the benefit of temporal supervision.
Transferability and Prognosis: In frameworks with stochastic temporal encoding (STAMP), AUROC on progression forecasting (e.g., iAMD→wet-AMD conversion) improves over deterministic and non-temporal methods (Emre et al., 29 Dec 2025).
Cross-validation and Data Regimes: Self-supervised temporal embeddings demonstrate resilience to overfitting and achieve state-of-the-art performance on small, label-scarce datasets.

5. Practical Considerations and Adaptation Guidelines

Multiple frameworks extend naturally to wound healing progression with tailored adjustments:

Temporal Encoding:
- Sin-cos plus MLP temporal encoding should match the timescale of the data (daily or weekly bins for $\Delta t$ ) and be injected into both encoder and, where applicable, decoder (Emre et al., 29 Dec 2025, Shen et al., 2024).
Masking Strategy: Mask ratios for autoencoding are tuned to balance the size and spatial resolution of input images (e.g., 50–75%). Over-masking forces greater use of temporal context (Emre et al., 29 Dec 2025).
Stochasticity and Uncertainty: Categorical latent codes capture heterogeneity in healing rates; straight-through one-hot or Gumbel-softmax sampling preserves differentiable learning (Emre et al., 29 Dec 2025).
Irregular Sampling: NODE-based and Transformer-based methods accommodate irregular intervals, critical for clinical wound cohorts (Zeghlache et al., 2023, Shen et al., 2024).
Loss Weighting and Sampling: Hyperparameters such as loss blending ( $\lambda$ ), learning rates, batch size, and augmentation intensities are dataset- and domain-specific. Batch sizes of 32–64 (HealNet), large batch for Transformers, and Adam/AdamW optimizers are recommended (Carrión et al., 2022, Shen et al., 2024).
Downstream Probes: After embedding pretraining, regression/classification heads targeting wound closure, healing time, or direct biological measurements can be appended to the frozen latent representations.

6. Broader Implications and Limitations

Self-supervised temporal embedding for wound healing progression has demonstrated measurable improvements in label-free staging, automatic phase discovery, and performance on clinically relevant tasks. The learned representations are biologically interpretable, facilitate rapid adaptation to new datasets, and scales to scarce or noisy data.

Limitations include:

Interpretability of Latents: Especially in stochastic and transformer-based models, understanding or visualizing high-dimensional latent codes or cluster assignments poses ongoing challenges.
Restriction to Pairwise or Short Clips: Some methods (e.g., STAMP as originally described) use only two visits per sample for pretraining, which does not exploit longer longitudinal trajectories as fully as NODE-based or recurrent frameworks (Emre et al., 29 Dec 2025).
Dependence on Accurate $\Delta t$ : Modeling continuous or discretized time intervals is critical; errors in time annotations or inconsistent visit intervals can degrade performance unless explicitly accommodated (Shen et al., 2024, Zeghlache et al., 2023).
Scalability: While computational overhead is generally modest for 2D images, autoencoding and ODE-integration may scale less efficiently for large 3D or high-frequency data without further optimization.

A plausible implication is that hybrid approaches—leveraging continuous-time ODE dynamics, stochastic temporal encoding, and biologically grounded clustering—may further enhance both the biological fidelity and clinical utility of temporal wound embeddings.

7. Current Directions and Comparative Summary

Contemporary research has converged on several axes of temporal embedding for wound healing progression, summarized as follows:

Framework	Temporal Encoding	Pretext Objective	Stage Discovery	Dynamics Modeling
HealNet (Carrión et al., 2022)	Implicit (ordering)	Forward/backward binary	K-means	No
STAMP (Emre et al., 29 Dec 2025)	Sin-cos+MLP, Δt	CVAE with masked rec+KL	—	Stochastic
TVRL (Shen et al., 2024)	Sin-cos+MLP, Δt	Clip contrast+masked rec	—	Transformer
LSSL+NODE (Zeghlache et al., 2023)	Continuous time, Δt	Cosine align+ODE rec	—	NODE (continuous)

Continued progress is marked by systematic ablation studies (e.g., impact of embedding dimension $d$ , number of clusters $k$ , loss type) and incorporation of external knowledge (e.g., staging consensus, biological priors) to refine pseudo-label generation and longitudinal embedding quality.

In summary, self-supervised temporal embedding frameworks have provided a robust, biologically-informed computational foundation for automated wound healing progression analysis, with demonstrated superiority over conventional image classifiers and promising flexibility for broader clinical deployment.