Evaluate the utility of advanced DNA sequence-derived features for predicting 3D chromatin interactions
Determine the predictive value of advanced DNA sequence feature sets, including the 147 sequence feature encodings provided by the iLearnPlus platform, for predicting three-dimensional chromatin interaction patterns such as topologically associating domain (TAD) boundaries and chromatin loops, in comparison to conventional sequence encodings (one-hot and k-mer) and other feature extraction approaches.
References
The utility of DNA sequence features for chromatin interaction predictions remains to be explored [186]. Besides one- hot encoding and k-mer representations of DNA sequence, recent methods introduce additional methods for feature extraction. The iLearnPlus [187] web-based tool includes 147 unique feature sets capturing various properties of DNA/RNA/protein sequences as well as 21 machine-learning algorithms with 7 deep-learning approaches for their extraction, clustering, normalization, and predictor construction. It remains to be seen how informative such features actually are for predicting TAD/loop boundaries or other chromatin interaction patterns.