Source of benefit from ensembling sequence-only and sequence+structure prompts for clinical prediction

Determine whether the small AUROC improvement observed on the ProteinGym clinical substitutions benchmark when ensembling PoET-2 prompts that use a sequence-only context with no query (Strategy A) together with prompts that use both sequence and structure in the context with no query (Strategy B) arises primarily from increasing the number of prompts in the ensemble or from the heterogeneity of prompt strategies (sequence-only versus sequence+structure contexts).

Background

In the clinical zero-shot setting, the authors evaluate multiple prompting strategies for PoET-2, including using sequence-only context (Strategy A), sequence+structure context (Strategy B), and their ensemble (Strategy E). They find that adding structure via a query harms performance, so they focus on context-only strategies.

The ensemble of Strategies A and B yields a small positive effect over not ensembling. However, because the improvement is minor, it is uncertain whether the gain comes from ensembling more prompts (i.e., increased averaging) or from combining different prompt types (sequence-only versus sequence+structure contexts). Clarifying this would guide best practices for constructing ensembles in clinical variant effect prediction.

References

Ensembling Strategies A and B (Strategy E) has a small positive effect over not ensembling, although due to the small effect size, it is unclear if the effect is simply due to ensembling more prompts, or due to ensembling different prompt strategies.

— Understanding protein function with a multimodal retrieval-augmented foundation model (2508.04724 - Jr et al., 5 Aug 2025) in Appendix, Section "Zero-shot variant effect prediction" -> "Clinical datasets" -> "Incorporating structure in the prompt"

Source of benefit from ensembling sequence-only and sequence+structure prompts for clinical prediction

Sponsor

Background

References

Related Problems