False positives among SMRT-derived 6mA positives cannot be directly identified

Ascertain which single-molecule real-time (SMRT) sequencing-derived positive 6mA methylation samples are false positives and quantify the false-positive rate in the constructed datasets for Arabidopsis thaliana, Drosophila melanogaster, and Xanthomonas oryzae pv. oryzicola BLS256.

Background

The paper uses SMRT-derived 6mA datasets and applies motif-based rules to clean negative samples, demonstrating improvements in model performance. However, the authors acknowledge a limitation: they cannot directly identify false positives within the SMRT-positive samples.

Because of this uncertainty, the evaluation focuses on removing false negatives from the negative set, leaving unresolved which putative positives are erroneous.

References

Since the positive samples were measured by SMRT, we were unable to determine which ones might be false positives.

DNA and Human Language: Epigenetic Memory and Redundancy in Linear Sequence (2503.23494 - Yang et al., 30 Mar 2025) in Section “Cleaning the Dataset to Verify 6mA Methylation Information”