Impact of GenomeScope assumptions on estimate accuracy
Determine the extent to which deviations from the assumptions of the GenomeScope and GenomeScope 2.0 k-mer spectrum models—including non-uniform distribution of heterozygosity across the genome, the presence of variant types beyond single-nucleotide polymorphisms, and genomic regions with more than two copies—impact the accuracy of parameter estimates produced by these models, and ascertain how this impact varies among species.
References
The GenomeScope model is still somewhat "unrealistic" for several reasons: different regions within genome have different probability of being heterozygous (i.e. heterozygosity is not uniformly distributed in a genome); many variants are not just SNPs; and/or a large proportion of the genome might be covered by repetitions with more than two copies. How much of a problem this presents in the estimates is still an open question, and the answer is most likely dependent on the studied species.