Deep Learning in Mining Biological Data

Published 28 Feb 2020 in q-bio.QM, cs.LG, and stat.ML | (2003.00108v1)

Abstract: Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Broadly categorized in three types (i.e., sequences, images, and signals), these data are huge in amount and complex in nature. Mining such an enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities and lately their deep architectures - known as deep learning (DL) - have been successfully applied to solve many complex pattern recognition problems. Highlighting the role of DL in recognizing patterns in biological data, this article provides - applications of DL to biological sequences, images, and signals data; overview of open access sources of these data; description of open source DL tools applicable on these data; and comparison of these tools from qualitative and quantitative perspectives. At the end, it outlines some open research challenges in mining biological data and puts forward a number of possible future perspectives.

Abstract PDF Upgrade to Chat

Citations (275)

View on Semantic Scholar

Summary

The paper reveals that deep learning techniques significantly enhance the pattern recognition of complex biological datasets, including sequences, images, and signals.
It rigorously compares DL frameworks like TensorFlow, Theano, Caffe, and PyTorch based on computational efficiency and community support.
The study highlights key challenges such as limited annotated data and network interpretability while identifying future prospects like deep reinforcement learning.

Deep Learning in Mining Biological Data: An Overview

This comprehensive survey tackles the challenge of applying deep learning (DL) techniques to mine the vast and intricate datasets emerging from various biological domains. These datasets, essential for understanding complex biological phenomena, fall into three primary categories: sequences, images, and signals. Each of these categories is characterized by significant complexity and volume, necessitating advanced approaches for effective pattern recognition. The paper's primary aim is to explore the applications of DL in these realms, evaluate available tools, and outline future research challenges.

Key Contributions and Findings

Applications in Biological Data Mining:

The paper highlights the growing prominence of DL architectures such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Recurrent Neural Networks (RNNs), and Autoencoders in dealing with biological data. The survey delineates applications across several biological data types, including: - Sequences: Identification of gene expression patterns and prediction of RNA-protein interactions using CNNs and RNNs. - Images: Application of CNNs in bioimaging tasks such as tumor and mitosis detection in histological data. - Signals: Utilizing autoencoders and RNNs for interpreting complex signal data from EEG and other biological signal domains.

Open Access Data Sources: The survey provides an extensive list of open access sources encompassing Omics, Bioimaging, and Brain/Body-Machine Interfaces (BMI) datasets. These datasets form the backbone for training DL models and advancing biological research.
Assessment of DL Tools: A detailed comparison of existing DL frameworks, such as TensorFlow, Theano, Caffe, and PyTorch, is presented. This comparison considers factors like community support, computational efficiency across different hardware platforms, and the breadth of supported DL architectures.
Performance Benchmarking: The authors provide performance benchmarks illustrating the computational efficiency of DL tools in training various architectures on both CPU and GPU platforms. This evaluation helps in identifying optimized frameworks for specific DL tasks.

Implications and Future Perspectives

The implications of this paper are multifold. Practically, it guides researchers in selecting appropriate DL tools and methodologies for their specific data types and research goals. Theoretically, it underscores essential gaps in current DL approaches. Challenges such as the need for large annotated datasets, interpretability of neural networks, and optimization strategies are pivotal areas requiring further exploration.

The paper also identifies deep reinforcement learning (deep RL) as a burgeoning area with untapped potential for biological applications. This could entail developing RL approaches tailored to dynamic and hierarchical biological data. Infrastructural advancements, particularly in computing platforms and data curation, are critical to unlocking these potentials.

Conclusion

This work aligns as a foundational reference for researchers looking to leverage DL in the life sciences domain. By mapping the landscape of existing tools and their applications, it not only underscores the strengths of DL in biological data mining but also calls attention to the pressing need for further scientific and technical advancements. As biological datasets continue to expand both in size and complexity, DL approaches, with continued innovation, are poised to become instrumental in unlocking the mysteries of biological systems. The paper effectively sets the stage for future work that can deepen our understanding and enhance the efficacy of DL in managing biological big data.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Deep Learning in Mining Biological Data

Summary

Deep Learning in Mining Biological Data: An Overview

Key Contributions and Findings

Implications and Future Perspectives

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (3)

Collections

Deep Learning in Mining Biological Data

Summary

Deep Learning in Mining Biological Data: An Overview

Key Contributions and Findings

Implications and Future Perspectives

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections