Determine the source of unexpected sequences in COLO829 PacBio HiFi reads
Determine the biological origin of approximately 95 kilobases of sequences found across 43 PacBio HiFi reads from the COLO829 tumor sample that lack supermaximal exact matches of length at least 51 base pairs when mapped to the human pangenome index comprising 100 human haplotype-resolved assemblies including T2T-CHM13; note that these sequences could not be assembled and NCBI BLAST reported only multiple weak matches to Bos taurus genomes.
References
NCBI BLAST suggested multiple weak hits to cow genomes. We could not identify the source of these sequences but there were few of them anyway.
— BWT construction and search at the terabase scale
(2409.00613 - Li, 1 Sep 2024) in Results, Identifying novel sequences