Dice Question Streamline Icon: https://streamlinehq.com

Necessary conditions and representations for a well‑posed genotype‑to‑phenotype mapping

Determine the necessary conditions and representation mappings—specifically the genotype and phenotype encodings and any model constraints—that render the bacterial genotype‑to‑phenotype mapping task well‑posed under Hadamard’s criteria of existence, uniqueness, and stability.

Information Square Streamline Icon: https://streamlinehq.com

Background

The authors formalize genotype‑to‑phenotype prediction as an optimization problem and explain that the mapping from parameters to predictive models is typically non‑injective in bacterial genomics due to factors such as genome‑wide linkage disequilibrium, limited and biased sampling, information loss in feature representations, and observational noise. This non‑uniqueness leads to ill‑posedness, undermining feature attribution for causal variant identification.

To address this, they pose explicit questions on how to make the task well‑posed, centering on constraints and representations for genotypes and phenotypes, and incorporating domain knowledge and regularization. Establishing these necessary conditions is presented as an unresolved problem, acknowledged to be complicated by the fact that the true genotype‑to‑phenotype mapping is unknown.

References

Given that there exists a "ground truth" formulation of the GP mapping that is unknown to us, we further ask:

OP1.B - What are the necessary conditions and representations required to achieve a well-posed task?

Whole-Genome Phenotype Prediction with Machine Learning: Open Problems in Bacterial Genomics (2502.07749 - James et al., 11 Feb 2025) in Section “Open problems”, Subsection “Open problem 1”