Dice Question Streamline Icon: https://streamlinehq.com

Identify confident regions for clinical samples

Determine, for an arbitrary clinical human genome sample not included in existing benchmarks, the set of sample-specific confident genomic regions in which ground-truth small variant calls from short-read sequencing can be trusted for accurate evaluation and filtering.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper contrasts low error rates reported on Genome-In-A-Bottle (GIAB) benchmarks—which rely on sample-specific confident regions—with substantially higher discrepancies observed when comparing callers genome-wide. This disparity arises because high-confidence truth data exist only within predefined confident regions for certain samples (e.g., GIAB), while outside these regions error rates are much higher.

For an arbitrary clinical sample, such sample-specific confident regions are not known a priori. The paper proposes sample-agnostic "easy regions" derived from pangenome assemblies to mitigate errors, but this does not directly provide the per-sample confident regions that underpin rigorous benchmarking and high-trust variant assessment.

References

The problem is that we do not know confident regions for a clinical sample.

Finding easy regions for short-read variant calling from pangenome data (2507.03718 - Li, 4 Jul 2025) in Section 1 (Introduction)