GenoCraft: A Comprehensive, User-Friendly Web-Based Platform for High-Throughput Omics Data Analysis and Visualization (2312.14249v3)
Abstract: The surge in high-throughput omics data has reshaped the landscape of biological research, underlining the need for powerful, user-friendly data analysis and interpretation tools. This paper presents GenoCraft, a web-based comprehensive software solution designed to handle the entire pipeline of omics data processing. GenoCraft offers a unified platform featuring advanced bioinformatics tools, covering all aspects of omics data analysis. It encompasses a range of functionalities, such as normalization, quality control, differential analysis, network analysis, pathway analysis, and diverse visualization techniques. This software makes state-of-the-art omics data analysis more accessible to a wider range of users. With GenoCraft, researchers and data scientists have access to an array of cutting-edge bioinformatics tools under a user-friendly interface, making it a valuable resource for managing and analyzing large-scale omics data. The API with an interactive web interface is publicly available at https://genocraft.stanford. edu/. We also release all the codes in https://github.com/futianfan/GenoCraft.
- Scenic: single-cell regulatory network inference and clustering. Nature methods, 14(11):1083–1086, 2017.
- Count-based differential expression analysis of rna sequencing data using r and bioconductor.
- Ncbi geo: archive for functional genomics data sets—10 years on. Nucleic acids research, 39(suppl_1):D1005–D1010, 2010.
- From reads to genes to pathways: differential expression analysis of rna-seq experiments using rsubread and the edger quasi-likelihood pipeline. F1000Research, 5, 2016.
- Data-driven detection of subtype-specific differentially expressed genes. Scientific reports, 11(1):332, 2021.
- Abds: tool suite for analyzing biologically diverse samples. bioRxiv, 2023.
- Normalization and noise reduction for single cell rna-seq experiments. Bioinformatics, 31(13):2225–2227, 2015.
- The reactome pathway knowledgebase. Nucleic acids research, 46(D1):D649–D655, 2018.
- Artificial intelligence foundation for therapeutic science. Nature Chemical Biology, pages 1–4, 2022.
- Single-cell rna sequencing technologies and bioinformatics pipelines. Experimental & molecular medicine, 50(8):1–14, 2018.
- Proteomic architecture of human coronary and aortic atherosclerosis. Circulation, 137(25):2741–2756, 2018.
- Allelotype of pancreatic adenocarcinoma using xenograft enrichment. Cancer research, 55(20):4670–4675, 1995.
- Algorithm as 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics), 28(1):100–108, 1979.
- Maker2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC bioinformatics, 12(1):1–14, 2011.
- Classification of low quality cells from single-cell rna-seq data. Genome biology, 17(1):1–15, 2016.
- KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research, 28(1):27–30, 2000.
- Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic acids research, 44(W1):W90–W97, 2016.
- Integrated identification of disease specific pathways using multi-omics data. bioRxiv, page 666065, 2019.
- Matthew Lease. On quality control and machine learning in crowdsourcing. In Workshops at the twenty-fifth AAAI conference on artificial intelligence. Citeseer, 2011.
- Yingzhou Lu. Multi-omics Data Integration for Identifying Disease Specific Biological Pathways. PhD thesis, Virginia Tech, 2018.
- COT: an efficient and accurate method for detecting marker genes among many subtypes. Bioinformatics Advances, 2(1):vbac037, 2022.
- Machine learning for synthetic data generation: a review. arXiv preprint arXiv:2302.04062, 2023.
- Wisdom of crowds for robust gene network inference. Nature methods, 9(8):796–804, 2012.
- Using graph theory to analyze biological networks. BioData mining, 4:1–27, 2011.
- The use of high-dimensional biology (genomics, transcriptomics, proteomics, and metabolomics) to understand the preterm parturition syndrome. BJOG: An International Journal of Obstetrics & Gynaecology, 113:118–135, 2006.
- Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic bulletin & review, 16:225–237, 2009.
- Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43):15545–15550, 2005.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of machine learning research, 9(11), 2008.
- Scientific discovery in the age of artificial intelligence. 620:47–60, 2023.
- Cosbin: cosine score-based iterative normalization of biologically diverse samples. Bioinformatics Advances, 2(1):vbac076, 2022.
- Rseqc: quality control of rna-seq experiments. Bioinformatics, 28(16):2184–2185, 2012.
- Interpretation of omics data analyses. Journal of human genetics, 66(1):93–102, 2021.
- DDN2.0: R and python packages for differential dependency network analysis of biological systems. bioRxiv, pages 2021–04, 2021.
- Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. Rna, 26(8):903–909, 2020.