Deep learning in bioinformatics: introduction, application, and perspective in big data era (1903.00342v1)

Published 28 Feb 2019 in q-bio.QM, cs.LG, and cs.NE

Abstract: Deep learning, which is especially formidable in handling big data, has achieved great success in various fields, including bioinformatics. With the advances of the big data era in biology, it is foreseeable that deep learning will become increasingly important in the field and will be incorporated in vast majorities of analysis pipelines. In this review, we provide both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics. We start from the recent achievements of deep learning in the bioinformatics field, pointing out the problems which are suitable to use deep learning. After that, we introduce deep learning in an easy-to-understand fashion, from shallow neural networks to legendary convolutional neural networks, legendary recurrent neural networks, graph neural networks, generative adversarial networks, variational autoencoder, and the most recent state-of-the-art architectures. After that, we provide eight examples, covering five bioinformatics research directions and all the four kinds of data type, with the implementation written in Tensorflow and Keras. Finally, we discuss the common issues, such as overfitting and interpretability, that users will encounter when adopting deep learning methods and provide corresponding suggestions. The implementations are freely available at \url{https://github.com/lykaust15/Deep_learning_examples}.

PDF Abstract

Deep Learning in Bioinformatics: Utilization and Implications

The paper, "Deep learning in bioinformatics: introduction, application, and perspective in big data era," authored by Yu Li, Chao Huang, Lizhong Ding, Zhongxiao Li, Yijie Pan, and Xin Gao, offers an in-depth examination of how deep learning has been applied to the field of bioinformatics. This work provides a comprehensive introduction to deep learning techniques, alongside an analysis of their applications in bioinformatics, highlighting both specific methodologies and the broader potential impacts.

The pervasive presence of big data in biological systems naturally complements the capabilities of deep learning models, which thrive on large datasets by identifying patterns impossible to discern manually. The paper underscores the numerous successes deep learning has achieved in various bioinformatics domains, notably in sequence analysis, structure prediction, and biomolecular function prediction.

Methodological Insights

The paper explores several deep learning architectures, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Graph Neural Networks (GNNs), Generative Adversarial Networks (GANs), and Variational Autoencoders (VAEs), offering insights into their applicability to bioinformatics problems. These models address different types of data challenges:

Sequence Data: CNNs and RNNs are effectively applied to extract features from linear biological sequences such as DNA, RNA, and proteins. CNNs' ability to identify local sequence patterns and relationships is well-utilized for tasks such as transcription factor binding prediction, while RNNs handle temporal dependencies in biological sequences, which is crucial for tasks like gene expression time series analysis.
Structured Data: Fully connected deep neural networks (DNNs) are leveraged for structured biological data, exemplified in enzyme function prediction through large-dimensional domain encodings.
Graph Data: Graph data such as protein-protein interaction networks are tackled using GNNs. These models facilitate the embedding of nodes within network structures, effectively capturing the topological ordering and enabling downstream tasks like interaction prediction.
Image Data: Biomedical images are processed using sophisticated models like ResNet, which are frequently optimized via transfer learning approaches. These strategies counterbalance the constraints of limited labeled biomedical datasets by adopting features from pre-trained networks on large image datasets like ImageNet.

Application Examples and Outcomes

The authors provide eight diverse examples across different bioinformatics application areas, demonstrating the utilization of different models:

Enzyme Function Prediction: Implemented using DNNs, achieving an accuracy of 94.5%, confirming the method’s efficacy in handling high-dimensional data.
Gene Expression Regression: Highlighting the ability of deep models to outperform traditional linear approaches due to their capacity to model non-linear relationships.
RNA-Protein Binding Sites Prediction: Utilizing CNNs for detecting sequence motifs essential in binding site interactions.
Biomedical Image Classification: Enhancing model efficacy through transfer learning, allowing for robust predictions even with limited direct training data.

The results demonstrated in the paper affirm the potent synergy between deep learning techniques and the high-dimensional, complex data characteristic of bioinformatics.

Theoretical and Practical Considerations

The researchers also address challenges inherent in deep learning applications in bioinformatics, such as overfitting, lack of interpretability, and data imbalance. Proposed solutions include regularization techniques, interpretability algorithms like backpropagation-based methods, and adapting evaluation metrics to avoid misleading results from unbalanced datasets.

Future Directions

Although widely successful, the deployment of deep learning methods in bioinformatics necessitates continual refinement. There is potential for further exploiting unsupervised generative methods like GANs and VAEs for drug design and molecular generation, as well as leveraging transfer learning for broader applications in biomedical contexts.

This paper signifies a poignant step toward integrating deep learning into bioinformatics, emphasizing a need for interdisciplinary collaboration. As computational capacities continue to rise within biological and biomedical sciences, deep learning is poised to play a pivotal role in achieving breakthroughs in understanding complex biological systems.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Yu Li (378 papers)
Chao Huang (244 papers)
Lizhong Ding (12 papers)
Zhongxiao Li (3 papers)
Yijie Pan (4 papers)
Xin Gao (208 papers)

Citations (279)

View on Semantic Scholar