Dice Question Streamline Icon: https://streamlinehq.com

Evaluate the utility of advanced DNA sequence-derived features for predicting 3D chromatin interactions

Determine the predictive value of advanced DNA sequence feature sets, including the 147 sequence feature encodings provided by the iLearnPlus platform, for predicting three-dimensional chromatin interaction patterns such as topologically associating domain (TAD) boundaries and chromatin loops, in comparison to conventional sequence encodings (one-hot and k-mer) and other feature extraction approaches.

Information Square Streamline Icon: https://streamlinehq.com

Background

The review highlights that most current predictors of 3D genome organization rely on conventional sequence encodings (e.g., one-hot and k-mer) or epigenomic annotations, while newer sequence-based feature engineering frameworks are emerging. iLearnPlus provides 147 feature sets and multiple machine learning and deep learning pipelines for sequence feature extraction and modeling, but their relevance to 3D genome prediction tasks has not been systematically assessed.

Establishing the informativeness of these advanced sequence-derived features for predicting TAD boundaries, loops, and other interaction signatures would clarify whether sequence-only models can match or complement epigenomic-driven approaches, potentially improving cross-cell-type generalization and interpretability.

References

The utility of DNA sequence features for chromatin interaction predictions remains to be explored [186]. Besides one- hot encoding and k-mer representations of DNA sequence, recent methods introduce additional methods for feature extraction. The iLearnPlus [187] web-based tool includes 147 unique feature sets capturing various properties of DNA/RNA/protein sequences as well as 21 machine-learning algorithms with 7 deep-learning approaches for their extraction, clustering, normalization, and predictor construction. It remains to be seen how informative such features actually are for predicting TAD/loop boundaries or other chromatin interaction patterns.

Machine and deep learning methods for predicting 3D genome organization (2403.03231 - Wall et al., 4 Mar 2024) in Discussion, paragraph on DNA sequence feature extraction and iLearnPlus