Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dive into Machine Learning Algorithms for Influenza Virus Host Prediction with Hemagglutinin Sequences (2207.13842v4)

Published 28 Jul 2022 in cs.LG

Abstract: Influenza viruses mutate rapidly and can pose a threat to public health, especially to those in vulnerable groups. Throughout history, influenza A viruses have caused pandemics between different species. It is important to identify the origin of a virus in order to prevent the spread of an outbreak. Recently, there has been increasing interest in using machine learning algorithms to provide fast and accurate predictions for viral sequences. In this study, real testing data sets and a variety of evaluation metrics were used to evaluate machine learning algorithms at different taxonomic levels. As hemagglutinin is the major protein in the immune response, only hemagglutinin sequences were used and represented by position-specific scoring matrix and word embedding. The results suggest that the 5-grams-transformer neural network is the most effective algorithm for predicting viral sequence origins, with approximately 99.54% AUCPR, 98.01% F1 score and 96.60% MCC at a higher classification level, and approximately 94.74% AUCPR, 87.41% F1 score and 80.79% MCC at a lower classification level.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Can winograd schemas replace turing test for defining human-level ai. IEEE Spectrum.– URL: http://spectrum. ieee. org/automaton/robotics/artificialintelligence/winograd-schemas-replace-turing-test-for-defining-humanlevel-artificial-intelligence (21.10.2014) .
  2. Iterated profile searches with psi-blast—a tool for discovery in protein databases. Trends in biochemical sciences 23, 444–447.
  3. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research 25, 3389–3402.
  4. Was the 1918 flu avian in origin? Nature 440, E9–E9.
  5. Applying neural networks to classify influenza virus antigenic types and hosts, in: 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, IEEE. pp. 1–6.
  6. Integrated use of three machine learning techniques for influenza virus classification.
  7. blast, . Index of /blast/executables. URL: http://ftp.ncbi.nih.gov/blast/executables.
  8. A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR) 49, 1–50.
  9. Diversity of influenza viruses in swine and the emergence of a novel human pandemic influenza a (h1n1). Influenza and other respiratory viruses 3, 207–213.
  10. The pig as an intermediate host for influenza a viruses between birds and humans, in: International Congress Series, Elsevier. pp. 173–178.
  11. On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research 11, 2079–2107.
  12. Smoteboost: Improving prediction of the minority class in boosting, in: European conference on principles of data mining and knowledge discovery, Springer. pp. 107–119.
  13. Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, NY, USA. p. 785–794. URL: https://doi.org/10.1145/2939672.2939785, doi:10.1145/2939672.2939785.
  14. Keras. https://keras.io.
  15. Influenza. The Lancet 354, 1277–1282.
  16. A protein structural classes prediction method based on predicted secondary structure and psi-blast profile. Biochimie 97, 60–65.
  17. Ecology and evolution of the flu. Trends in ecology & evolution 17, 334–340.
  18. Predicting host tropism of influenza a virus proteins using random forest. BMC medical genomics 7, 1–11.
  19. Antigenic and genetic characteristics of swine-origin 2009 a (h1n1) influenza viruses circulating in humans. science 325, 197–201.
  20. GISAID, . Initiative. URL: https://www.gisaid.org/.
  21. Evolution of the nucleoprotein gene of influenza a virus. Journal of virology 64, 1487–1497.
  22. Machine learning and event-based software testing: classifiers for identifying infeasible gui event sequences, in: Advances in Computers. Elsevier. volume 86, pp. 109–135.
  23. Random decision forests, in: Proceedings of 3rd international conference on document analysis and recognition, IEEE. pp. 278–282.
  24. Updating the accounts: global mortality of the 1918-1920 "spanish" influenza pandemic. Bulletin of the History of Medicine , 105–115.
  25. Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments. BMC genomics 17, 1–10.
  26. Influenza pandemics of the 20th century. Emerging infectious diseases 12, 9.
  27. The structural variability of the influenza a hemagglutinin receptor-binding site. Briefings in functional genomics 17, 415–427.
  28. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research 18, 1–5. URL: http://jmlr.org/papers/v18/16-365.html.
  29. Reduction of protein sequence complexity by residue grouping. Protein Engineering 16, 323–330.
  30. Prediction of protein structural class for low-similarity sequences using support vector machine and psi-blast profile. Biochimie 92, 1330–1334.
  31. Host and viral determinants of influenza a virus species specificity. Nature Reviews Microbiology 17, 67–81.
  32. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 .
  33. Perceptrons: An introduction to computational geometry. MIT press.
  34. Vidhop, viral host prediction with deep learning. Bioinformatics 37, 318–325.
  35. The 1918 influenza pandemic: insights for the 21st century. The Journal of infectious diseases 195, 1018–1028.
  36. Human infection with h9n2 avian influenza in northern china. Clinical Microbiology and Infection 24, 321–323.
  37. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830.
  38. Laboratory-confirmed avian influenza a (h9n2) virus infection, india, 2019. Emerging Infectious Diseases 25, 2328.
  39. Harnessing computational biology for exact linear b-cell epitope prediction: a novel amino acid composition-based feature descriptor. Omics: a journal of integrative biology 19, 648–658.
  40. Predicting influenza a tropism with end-to-end learning of deep networks. Health security 17, 468–476.
  41. Rusboost: Improving classification performance when training data is skewed, in: 2008 19th International Conference on Pattern Recognition, IEEE. pp. 1–4.
  42. Classification of host origin in influenza a virus by transferring protein sequences into numerical feature vectors. Int J Biol Biomed Eng 11.
  43. Dating the emergence of pandemic influenza viruses. Proceedings of the National Academy of Sciences 106, 11709–11712.
  44. Origins and evolutionary genomics of the 2009 swine-origin h1n1 influenza a epidemic. Nature 459, 1122–1125.
  45. Reassessing the global mortality burden of the 1918 influenza pandemic. American journal of epidemiology 187, 2561–2567.
  46. Influenza virus evolution, host adaptation, and pandemic formation. Cell host & microbe 7, 440–451.
  47. Characterization of the 1918 influenza virus polymerase genes. Nature 437, 889–893.
  48. Attention is all you need. arXiv:1706.03762.
  49. Evolution and ecology of influenza a viruses. Microbiological reviews 56, 152--179.
  50. A synchronized global sweep of the internal genes of modern avian influenza virus. Nature 508, 254--257.
  51. A one-letter notation for amino acid sequences .
  52. Predicting the host of influenza viruses based on the word vector. PeerJ 5, e3579.
  53. Computational identification of physicochemical signatures for host tropism of influenza a virus. Journal of bioinformatics and computational biology 16, 1840023.
  54. Isolation and phylogenetic analysis of pandemic h1n1/09 influenza virus from swine in jiangsu province of china. Research in veterinary science 93, 125--132.
Citations (12)

Summary

We haven't generated a summary for this paper yet.