Papers
Topics
Authors
Recent
Search
2000 character limit reached

SARS-Cov-2 RNA Sequence Classification Based on Territory Information

Published 9 Jan 2021 in q-bio.QM, cs.LG, and stat.CO | (2101.03323v1)

Abstract: CovID-19 genetics analysis is critical to determine virus type,virus variant and evaluate vaccines. In this paper, SARS-Cov-2 RNA sequence analysis relative to region or territory is investigated. A uniform framework of sequence SVM model with various genetics length from short to long and mixed-bases is developed by projecting SARS-Cov-2 RNA sequence to different dimensional space, then scoring it according to the output probability of pre-trained SVM models to explore the territory or origin information of SARS-Cov-2. Different sample size ratio of training set and test set is also discussed in the data analysis. Two SARS-Cov-2 RNA classification tasks are constructed based on GISAID database, one is for mainland, Hongkong and Taiwan of China, and the other is a 6-class classification task (Africa, Asia, Europe, North American, South American& Central American, Ocean) of 7 continents. For 3-class classification of China, the Top-1 accuracy rate can reach 82.45\% (train 60\%, test=40\%); For 2-class classification of China, the Top-1 accuracy rate can reach 97.35\% (train 80\%, test 20\%); For 6-class classification task of world, when the ratio of training set and test set is 20\% : 80\% , the Top-1 accuracy rate can achieve 30.30\%. And, some Top-N results are also given.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.