Papers
Topics
Authors
Recent
2000 character limit reached

MirLibSpark: A Scalable NGS Plant MicroRNA Prediction Pipeline for Multi-Library Functional Annotation (2501.17998v1)

Published 29 Jan 2025 in cs.DC and q-bio.GN

Abstract: The emergence of the Next Generation Sequencing increases drastically the volume of transcriptomic data. Although many standalone algorithms and workflows for novel microRNA (miRNA) prediction have been proposed, few are designed for processing large volume of sequence data from large genomes, and even fewer further annotate functional miRNAs by analyzing multiple libraries. We propose an improved pipeline for a high volume data facility by implementing mirLibSpark based on the Apache Spark framework. This pipeline is the fastest actual method, and provides an accuracy improvement compared to the standard. In this paper, we deliver the first distributed functional miRNA predictor as a standalone and fully automated package. It is an efficient and accurate miRNA predictor with functional insight. Furthermore, it compiles with the gold-standard requirement on plant miRNA predictions.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.