Dice Question Streamline Icon: https://streamlinehq.com

Cause of the gene-age bias in functional characterization

Determine the causal factors underlying the observed negative association between gene age and experimental functional characterization in plants, specifically explaining why genes present across all land plants are more frequently characterized than genes restricted to the Arabidopsis genus.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper reports that older genes—those conserved across land plants—are more frequently experimentally characterized than younger, genus-specific genes in Arabidopsis thaliana. The authors note a negative association between gene age and functional characterization and suggest potential explanations, including historical biases toward mutants with strong phenotypes and the essential, basal functions of older genes.

This bias has downstream implications for computational prediction methods that rely on network connectivity (e.g., protein–protein interactions and co-expression), as older genes tend to be more connected to already characterized genes, making their functions easier to predict. Understanding the root causes of this bias is important for designing strategies to better characterize younger, lineage-specific genes and to reduce systemic biases in prediction workflows.

References

We observed a negative association between gene age and functional characterization. Genes found in all land plants tend to be more characterized than genes found only in the Arabidopsis genus (Figure 1f-g) [22]. While it is unclear why this is so, we speculate that early plant research often began by identifying mutants with detectable phenotypes [23], and older genes tend to have more basal, essential functions [22], resulting in stronger phenotypes.

The gene function prediction challenge: large language models and knowledge graphs to the rescue (2408.07222 - Sunil et al., 13 Aug 2024) in Section “Age-specific Biases in Gene Characterization and Function Prediction” (following Figure 1f–g)