Dice Question Streamline Icon: https://streamlinehq.com

Ascertain bias in EHR-derived phenotyping algorithms

Ascertain whether disease phenotypes constructed using algorithmic methods from electronic health records in EHR-linked biobanks are biased and identify the specific sources of such bias that influence case-control assignment and downstream genetic association analyses.

Information Square Streamline Icon: https://streamlinehq.com

Background

Phenotypes used in genome- and phenome-wide association studies within EHR-linked biobanks are often algorithmically constructed from clinical notes and other EHR fields. The authors note that racial bias and other systemic biases can propagate through these constructions, affecting case/control labels and leading to spurious genetic associations.

They highlight that, prior to analysis, researchers usually do not know whether these algorithmic phenotypes are biased or where such biases originate. They suggest initial diagnostic steps (e.g., comparing socioeconomic and comorbidity profiles) to detect potential systematic differences indicative of bias, underscoring the need for methods to determine and trace sources of bias in EHR-derived phenotyping.

References

However, it is typically not known a priori whether an algorithmically constructed phenotype is biased nor the exact source of the bias.

Implications of self-identified race, ethnicity, and genetic ancestry on genetic association studies in biobanks within health systems (2402.15696 - Johnson et al., 24 Feb 2024) in Racial bias in EHR-derived phenotyping