Temporal Prompt Alignment Score (P-score)

Updated 28 January 2026

Temporal Prompt Alignment Score (P-score) is defined as the cosine similarity between a temporally aggregated video embedding and class-specific text prompt embeddings.
It integrates temporal modeling, contrastive learning, and uncertainty calibration to enhance fetal CHD classification from ultrasound videos.
Empirical results show that optimizing hyperparameters and incorporating CVAESM improves both discrimination and calibration, boosting metrics like F1 and AUC.

The Temporal Prompt Alignment Score ("P-score", Editor's term) is a metric that quantifies the alignment between a temporally aggregated video embedding and each class-specific text prompt embedding within the Temporal Prompt Alignment (TPA) framework for fetal congenital heart defect (CHD) classification from ultrasound videos. The P-score is formally defined as the cosine similarity between a video-level embedding produced by a temporal extractor and the projected embedding of a clinically motivated text prompt representing each candidate class. The P-score serves as the core building block for both classification and contrastive learning within this system, integrating temporal modeling, image-text alignment, and uncertainty calibration for robust video-based medical diagnosis (Taratynova et al., 21 Aug 2025).

1. Formal Definition

Let $x_t \in \mathbb{R}^{768}$ denote the feature vector for frame $t$ , extracted via a frozen image encoder such as EchoCLIP or FetalCLIP. For a subclip of length $L$ , these are stacked into $X = [x_1; \dots; x_L] \in \mathbb{R}^{L \times 768}$ . A lightweight temporal extractor $f_\text{temp}$ (e.g., GNN, xLSTM, TCN) is used to aggregate these frame-level features, producing a video-level embedding $h = f_\text{temp}(X) \in \mathbb{R}^{256}$ .

For each class $c \in \{0,\dots,C-1\}$ , let $\pi_c \in \mathbb{R}^{768}$ be the embedding of a class-specific text prompt encoded by a frozen text encoder. A learned projection $W_\text{txt} \in \mathbb{R}^{256 \times 768}$ yields $\pi_c^{\text{proj}} = W_\text{txt} \pi_c \in \mathbb{R}^{256}$ .

The P-score for class $c$ is defined as

$P_c \equiv s_c = \frac{\langle h, \pi_c^{\text{proj}} \rangle}{\|h\| \; \|\pi_c^{\text{proj}}\|} \in [-1, 1].$

where $\langle \cdot, \cdot \rangle$ is the standard Euclidean inner product and $\| \cdot \|$ the $\ell_2$ norm. This cosine similarity measures the alignment between the temporally aggregated video features and each class prompt embedding (Taratynova et al., 21 Aug 2025).

2. Stepwise Computation Procedure

The computation of the P-score proceeds as follows:

Frame-level Feature Extraction:
- $L$ consecutive video frames $\{I_t\}_{t=1}^L$ are sampled from the ultrasound sequence.
- For each frame, a frozen image encoder maps $I_t$ to $x_t = f_\text{img}(I_t) \in \mathbb{R}^{768}$ .
Temporal Aggregation:
- The sequence of frame features $X = [x_1;\dots;x_L]$ is passed through a temporal extractor: $h = f_\text{temp}(X) \in \mathbb{R}^{256}$ .
Prompt Embedding Construction:
- For each class $c$ , a concise, task-specific text prompt (e.g., “Is the fetal heart normal in this 4CH ultrasound view?”) is encoded to $\pi_c$ via a frozen text encoder.
- A learned linear layer projects $\pi_c$ to video-embedding space: $\pi_c^{\text{proj}} = W_\text{txt} \pi_c$ .
Alignment Computation:
- Calculate the P-score for each class as the cosine similarity between $h$ and $\pi_c^\text{proj}$ .
Softmax Normalization (optional):
- The vector of P-scores $(P_c)_{c=0}^{C-1}$ can be temperature-softmaxed:
$p_c = \frac{\exp(P_c/\tau)}{\sum_{j=0}^{C-1} \exp(P_j/\tau)}, \qquad \tau > 0.$

The highest $p_c$ determines the predicted class; $P_c$ values thus underlie both class assignment and estimated confidence (Taratynova et al., 21 Aug 2025).

3. Hyperparameters Impacting P-score Dynamics

Several key hyperparameters control the behavior and discriminative power of P-scores:

Margin $m > 0$ : Used in the margin-hinge contrastive loss, enforcing a required gap between the true-class P-score and the hardest negative P-score.
Contrastive weight $\alpha \ge 0$ : Balances the contribution of the margin-hinge contrastive loss relative to the conventional cross-entropy classification loss.
Temperature $\tau > 0$ : Controls the sharpness or flatness of the softmax over P-scores, affecting output confidence calibration.
CVAESM KL-weight $\beta \ge 0$ : Sets the strength of the KL-divergence regularizer in the Conditional Variational Autoencoder Style Modulation module for uncertainty estimation.

Optimizing these hyperparameters is essential for achieving favorable trade-offs among discrimination, alignment, and calibration (Taratynova et al., 21 Aug 2025).

4. Training, Inference, and Calibration Integration

The P-score’s role in training, inference, and calibration is multifaceted:

Classification Loss:
- Standard cross-entropy is computed on temperature-softmaxed P-scores: $\mathcal{L}_{\mathrm{cls}} = -\sum_c y_c \log p_c$ , where $y_c$ is the one-hot true label.
Margin-hinge Contrastive Loss:
- The positive P-score $s^+ = P_y$ for the true class $y$ , and the hardest negative $s^- = \max_{j\neq y} P_j$ , yields
$\mathcal{L}_{\mathrm{ctr}} = \max(0, m - s^+ + s^-).$ - The total loss (omitting uncertainty) is $\mathcal{L}_\mathrm{total} = \mathcal{L}_{\mathrm{cls}} + \alpha \mathcal{L}_{\mathrm{ctr}}$ .
Uncertainty Quantification (CVAESM):
- A latent style vector $z \in \mathbb{R}^{256}$ , learned via a CVAE conditioned on $(h, y)$ , modulates $h$ through an elementwise affine transformation: $\tilde h = h \odot (1 + g(z))$ .
- During training, $z$ is sampled from the variational posterior $q(z|h, y)$ ; at inference, its mean under the prior $p(z | h)$ is used.
- The KL-divergence term for style regularization enters the final objective proportional to $\beta$ .
- $\tilde h$ replaces $h$ when computing P-scores in both training and test time; thus, P-score remains central to stochastic as well as deterministic inference.

A plausible implication is that the modularity of P-score computation enables seamless integration with both loss functions and uncertainty-aware calibration procedures (Taratynova et al., 21 Aug 2025).

5. Empirical Behavior and Calibration Performance

Empirical evaluation of P-score dynamics reveals its influence on model confidence and calibration:

Calibration Shifts:
- Reliability diagrams show that following application of the CVAESM module, the distribution of maximum P-score (post-softmax) confidences shifts from overconfident 90–100% bins into better-calibrated 60–90% ranges.
- Expected Calibration Error (ECE) is reduced from approximately 16% to 10%; Adaptive ECE decreases from about 40% to 33% after uncertainty modulation.
Decision Thresholding:
- No explicit fixed P-score threshold is used; class prediction is always determined via the argmax over softmax-normalized P-scores.
Performance Impact:
- Across temporal aggregation modules (GNN, xLSTM, TCN), contrastive regularization based on the P-score consistently improves macro F1 and AUC.
- Introducing CVAESM style-modulation slightly decreases mean F1 (by about 1%) but yields substantial calibration gains.

This suggests that the P-score not only acts as a discriminative signal for class assignment but also underpins the model’s confidence estimation and reliability under uncertainty-aware inference (Taratynova et al., 21 Aug 2025).

6. Relation to Contrastive Learning and Discriminative Alignment

The P-score is the foundational measure of semantic alignment between video features and text prompt prototypes in the TPA architecture:

Contrastive Regularization:
- The margin-hinge loss, operating on P-scores, enforces separation between true-class and non-true-class alignments. Specifically, $s^+$ must exceed $s^-$ by at least the margin $m$ , promoting clustering of intra-class embeddings around their respective prompts.
Optimal Hyperparameter Regimes:
- Empirically, margin $m = 0.5$ and weight $\alpha = 0.5$ provide an optimal balance, yielding macro F1 ≈ 84.7% and AUC ≈ 87.6% on binary CHD detection. Extremes in $m$ or $\alpha$ can degrade discrimination either by over-separating or by insufficiently utilizing the prompt structure.
General Applicability:
- The P-score driven contrastive term consistently improved macro F1 and AUC across task variants and extractor architectures, indicating robust effectiveness in class separation mediated by prompt alignment.

In summary, the Temporal Prompt Alignment Score (P-score) is the cosine similarity between the temporally aggregated video embedding and each class’s projected text prompt embedding. It is the central decision statistic for both softmax-based classification and for enforcing class-prototype clustering in the embedding space, with its behavior and downstream performance governed by careful choice of margin, contrastive loss weight, and temperature (Taratynova et al., 21 Aug 2025).

Markdown Report Issue Upgrade to Chat

References (1)

TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Temporal Prompt Alignment Score (P-score).

Temporal Prompt Alignment Score (P-score)

1. Formal Definition

2. Stepwise Computation Procedure

3. Hyperparameters Impacting P-score Dynamics

4. Training, Inference, and Calibration Integration

5. Empirical Behavior and Calibration Performance

6. Relation to Contrastive Learning and Discriminative Alignment

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Temporal Prompt Alignment Score (P-score)

1. Formal Definition

2. Stepwise Computation Procedure

3. Hyperparameters Impacting P-score Dynamics

4. Training, Inference, and Calibration Integration

5. Empirical Behavior and Calibration Performance

6. Relation to Contrastive Learning and Discriminative Alignment

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research