Papers
Topics
Authors
Recent
2000 character limit reached

Seq-HyGAN: Sequence Classification via Hypergraph Attention Network (2303.02393v3)

Published 4 Mar 2023 in cs.LG and cs.AI

Abstract: Sequence classification has a wide range of real-world applications in different domains, such as genome classification in health and anomaly detection in business. However, the lack of explicit features in sequence data makes it difficult for machine learning models. While Neural Network (NN) models address this with learning features automatically, they are limited to capturing adjacent structural connections and ignore global, higher-order information between the sequences. To address these challenges in the sequence classification problems, we propose a novel Hypergraph Attention Network model, namely Seq-HyGAN. To capture the complex structural similarity between sequence data, we first create a hypergraph where the sequences are depicted as hyperedges and subsequences extracted from sequences are depicted as nodes. Additionally, we introduce an attention-based Hypergraph Neural Network model that utilizes a two-level attention mechanism. This model generates a sequence representation as a hyperedge while simultaneously learning the crucial subsequences for each sequence. We conduct extensive experiments on four data sets to assess and compare our model with several state-of-the-art methods. Experimental results demonstrate that our proposed Seq-HyGAN model can effectively classify sequence data and significantly outperform the baselines. We also conduct case studies to investigate the contribution of each module in Seq-HyGAN.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Mehmet Emin Aktas and Esra Akbas. 2021. Hypergraph Laplacians in Diffusion Framework. Studies in Computational Intelligence 1016 (2 2021), 277–288. https://doi.org/10.48550/arxiv.2102.08867
  2. Identifying critical higher-order interactions in complex networks. Scientific Reports 2021 11:1 11 (10 2021), 1–11. Issue 1. https://doi.org/10.1038/s41598-021-00017-y
  3. PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences. Biology 11 (3 2022). Issue 3. https://doi.org/10.3390/BIOLOGY11030418
  4. Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data. Nature Communications 2020 11:1 11 (3 2020), 1–11. Issue 1. https://doi.org/10.1038/s41467-020-14974-x
  5. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PloS one 4, 2 (2009), e4345.
  6. Mechanisms of coronavirus cell entry mediated by the viral spike protein. Viruses 4 (2012), 1011–1033. Issue 6. https://doi.org/10.3390/V4061011
  7. Alain Bretto. [n. d.]. Hypergraph Theory. ([n. d.]). http://www.springer.com/series/8445
  8. Drug-Drug Interaction Prediction: a Purely SMILES Based Approach. (1 2022), 5571–5579. https://doi.org/10.1109/BIGDATA52589.2021.9671766
  9. Nagesh Singh Chauhan. [n. d.]. Demystify DNA Sequencing with Machine Learning — Kaggle. https://www.kaggle.com/code/nageshsingh/demystify-dna-sequencing-with-machine-learning/notebook
  10. Francois Chollet et al. 2015. Keras. https://github.com/fchollet/keras
  11. Why are de Bruijn graphs useful for genome assembly? Nature biotechnology 29, 11 (2011), 987.
  12. Be more with less: Hypergraph attention networks for inductive text classification. arXiv preprint arXiv:2011.00387 (2020).
  13. Jesse Eickholt and Jianlin Cheng. 2013. DNdisorder: Predicting protein disorder using boosting and deep networks. BMC Bioinformatics 14 (3 2013), 1–10. Issue 1. https://doi.org/10.1186/1471-2105-14-88/FIGURES/6
  14. Hypergraph Neural Networks. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (9 2018), 3558–3565. https://doi.org/10.48550/arxiv.1809.09401
  15. Structure-based protein function prediction using graph convolutional networks. Nature Communications 2021 12:1 12 (5 2021), 1–14. Issue 1. https://doi.org/10.1038/s41467-021-23303-9
  16. De novo design of anticancer peptides by ensemble artificial neural networks. ([n. d.]). https://doi.org/10.1007/s00894-019-4007-6
  17. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.
  18. DNA-GCN: Graph convolutional networks for predicting DNA-protein binding. ([n. d.]). https://github.com/Tinard/dnagcn.
  19. Explainable substructure partition fingerprint for protein, drug, and more. In NeurIPS Learning Meaningful Representation of Life Workshop.
  20. Identification of subtypes of anticancer peptides based on sequential features and physicochemical properties. Scientific Reports 2021 11:1 11 (6 2021), 1–13. Issue 1. https://doi.org/10.1038/s41598-021-93124-9
  21. Text Level Graph Neural Network for Text Classification. ([n. d.]). https://www.cs.umb.edu/
  22. HumanNet v2: human gene networks for disease research. Nucleic acids research 47 (1 2019), D573–D580. Issue D1. https://doi.org/10.1093/NAR/GKY1126
  23. Proximity-based compression for network embedding. Frontiers in big Data 3 (2021), 608043.
  24. Hypergraph Attention Networks for Multimodal Learning. ([n. d.]). https://spacy.io/
  25. Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone. Biochemical and Biophysical Research Communications 533 (12 2020), 553–558. Issue 3. https://doi.org/10.1016/J.BBRC.2020.09.010
  26. The spectrum kernel: A string kernel for SVM protein classification. In Biocomputing 2002. World Scientific, 564–575.
  27. graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005 (2017).
  28. DNA Sequence Classification by Convolutional Neural Network. Journal of Biomedical Science and Engineering 9 (4 2016), 280–286. Issue 5. https://doi.org/10.4236/JBISE.2016.95021
  29. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
  30. Yujie. Qian. 2019. A graph-based framework for information extraction. (2019). https://dspace.mit.edu/handle/1721.1/122765
  31. Daniele P. Radicioni and Roberto Esposito. 2010. BREVE: An HMPerceptron-based chord recognition system. Studies in Computational Intelligence 274 (2010), 143–164. https://doi.org/10.1007/978-3-642-11674-2_7
  32. Kristoffer Sahlin. 2021. Strobemers: an alternative to k-mers for sequence comparison. bioRxiv (2021), 2021–01.
  33. HyGNN: Drug-Drug Interaction Prediction via Hypergraph Neural Network. arXiv preprint arXiv:2206.12747 (2022).
  34. Drug Abuse Detection in Twitter-sphere: Graph-Based Approach. Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 (2021), 4136–4145. https://doi.org/10.1109/BIGDATA52589.2021.9671532
  35. Protein homology detection using string alignment kernels. Bioinformatics (Oxford, England) 20 (7 2004), 1682–1689. Issue 11. https://doi.org/10.1093/BIOINFORMATICS/BTH141
  36. Patrick Brendan Timmons and Chandralal M. Hewage. 2021. ENNAACT is a novel tool which employs neural networks for anticancer activity classification for therapeutic peptides. Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie 133 (1 2021). https://doi.org/10.1016/J.BIOPHA.2020.111051
  37. Deep learning methods in protein structure prediction. Computational and Structural Biotechnology Journal 18 (1 2020), 1301–1310. https://doi.org/10.1016/J.CSBJ.2019.12.011
  38. Integrating long-range connectivity information into de Bruijn graphs. ([n. d.]). https://doi.org/10.1093/bioinformatics/bty157
  39. UCI. [n. d.]. UCI Machine Learning Repository: Data Sets. https://archive.ics.uci.edu/ml/datasets.php?format=&task=&att=&area=&numAtt=&numIns=&type=seq&sort=nameUp&view=table
  40. Attention is all you need. Advances in neural information processing systems 30 (2017).
  41. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
  42. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. (9 2019). https://doi.org/10.48550/arxiv.1909.01315
  43. Anti-cancer peptides: classification, mechanism of action, reconstruction and modification. Open Biology 10 (7 2020). Issue 7. https://doi.org/10.1098/RSOB.200004
  44. Graph Convolutional Networks for Text Classification. ([n. d.]). www.aaai.org
  45. Mining k-mers of various lengths in biological sequences. In Bioinformatics Research and Applications: 13th International Symposium, ISBRA 2017, Honolulu, HI, USA, May 29–June 2, 2017, Proceedings 13. Springer, 186–195.
  46. Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks. ([n. d.]). https://github.com/CRIPAC-DIG/TextING
  47. Jaroslaw Zola. 2014. Constructing similarity graphs from large-scale biological sequence collections. In 2014 IEEE International Parallel & Distributed Processing Symposium Workshops. IEEE, 500–507.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.