Seq-HyGAN: Sequence Classification via Hypergraph Attention Network (2303.02393v3)
Abstract: Sequence classification has a wide range of real-world applications in different domains, such as genome classification in health and anomaly detection in business. However, the lack of explicit features in sequence data makes it difficult for machine learning models. While Neural Network (NN) models address this with learning features automatically, they are limited to capturing adjacent structural connections and ignore global, higher-order information between the sequences. To address these challenges in the sequence classification problems, we propose a novel Hypergraph Attention Network model, namely Seq-HyGAN. To capture the complex structural similarity between sequence data, we first create a hypergraph where the sequences are depicted as hyperedges and subsequences extracted from sequences are depicted as nodes. Additionally, we introduce an attention-based Hypergraph Neural Network model that utilizes a two-level attention mechanism. This model generates a sequence representation as a hyperedge while simultaneously learning the crucial subsequences for each sequence. We conduct extensive experiments on four data sets to assess and compare our model with several state-of-the-art methods. Experimental results demonstrate that our proposed Seq-HyGAN model can effectively classify sequence data and significantly outperform the baselines. We also conduct case studies to investigate the contribution of each module in Seq-HyGAN.
- Mehmet Emin Aktas and Esra Akbas. 2021. Hypergraph Laplacians in Diffusion Framework. Studies in Computational Intelligence 1016 (2 2021), 277–288. https://doi.org/10.48550/arxiv.2102.08867
- Identifying critical higher-order interactions in complex networks. Scientific Reports 2021 11:1 11 (10 2021), 1–11. Issue 1. https://doi.org/10.1038/s41598-021-00017-y
- PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences. Biology 11 (3 2022). Issue 3. https://doi.org/10.3390/BIOLOGY11030418
- Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data. Nature Communications 2020 11:1 11 (3 2020), 1–11. Issue 1. https://doi.org/10.1038/s41467-020-14974-x
- Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PloS one 4, 2 (2009), e4345.
- Mechanisms of coronavirus cell entry mediated by the viral spike protein. Viruses 4 (2012), 1011–1033. Issue 6. https://doi.org/10.3390/V4061011
- Alain Bretto. [n. d.]. Hypergraph Theory. ([n. d.]). http://www.springer.com/series/8445
- Drug-Drug Interaction Prediction: a Purely SMILES Based Approach. (1 2022), 5571–5579. https://doi.org/10.1109/BIGDATA52589.2021.9671766
- Nagesh Singh Chauhan. [n. d.]. Demystify DNA Sequencing with Machine Learning — Kaggle. https://www.kaggle.com/code/nageshsingh/demystify-dna-sequencing-with-machine-learning/notebook
- Francois Chollet et al. 2015. Keras. https://github.com/fchollet/keras
- Why are de Bruijn graphs useful for genome assembly? Nature biotechnology 29, 11 (2011), 987.
- Be more with less: Hypergraph attention networks for inductive text classification. arXiv preprint arXiv:2011.00387 (2020).
- Jesse Eickholt and Jianlin Cheng. 2013. DNdisorder: Predicting protein disorder using boosting and deep networks. BMC Bioinformatics 14 (3 2013), 1–10. Issue 1. https://doi.org/10.1186/1471-2105-14-88/FIGURES/6
- Hypergraph Neural Networks. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (9 2018), 3558–3565. https://doi.org/10.48550/arxiv.1809.09401
- Structure-based protein function prediction using graph convolutional networks. Nature Communications 2021 12:1 12 (5 2021), 1–14. Issue 1. https://doi.org/10.1038/s41467-021-23303-9
- De novo design of anticancer peptides by ensemble artificial neural networks. ([n. d.]). https://doi.org/10.1007/s00894-019-4007-6
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.
- DNA-GCN: Graph convolutional networks for predicting DNA-protein binding. ([n. d.]). https://github.com/Tinard/dnagcn.
- Explainable substructure partition fingerprint for protein, drug, and more. In NeurIPS Learning Meaningful Representation of Life Workshop.
- Identification of subtypes of anticancer peptides based on sequential features and physicochemical properties. Scientific Reports 2021 11:1 11 (6 2021), 1–13. Issue 1. https://doi.org/10.1038/s41598-021-93124-9
- Text Level Graph Neural Network for Text Classification. ([n. d.]). https://www.cs.umb.edu/
- HumanNet v2: human gene networks for disease research. Nucleic acids research 47 (1 2019), D573–D580. Issue D1. https://doi.org/10.1093/NAR/GKY1126
- Proximity-based compression for network embedding. Frontiers in big Data 3 (2021), 608043.
- Hypergraph Attention Networks for Multimodal Learning. ([n. d.]). https://spacy.io/
- Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone. Biochemical and Biophysical Research Communications 533 (12 2020), 553–558. Issue 3. https://doi.org/10.1016/J.BBRC.2020.09.010
- The spectrum kernel: A string kernel for SVM protein classification. In Biocomputing 2002. World Scientific, 564–575.
- graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005 (2017).
- DNA Sequence Classification by Convolutional Neural Network. Journal of Biomedical Science and Engineering 9 (4 2016), 280–286. Issue 5. https://doi.org/10.4236/JBISE.2016.95021
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
- Yujie. Qian. 2019. A graph-based framework for information extraction. (2019). https://dspace.mit.edu/handle/1721.1/122765
- Daniele P. Radicioni and Roberto Esposito. 2010. BREVE: An HMPerceptron-based chord recognition system. Studies in Computational Intelligence 274 (2010), 143–164. https://doi.org/10.1007/978-3-642-11674-2_7
- Kristoffer Sahlin. 2021. Strobemers: an alternative to k-mers for sequence comparison. bioRxiv (2021), 2021–01.
- HyGNN: Drug-Drug Interaction Prediction via Hypergraph Neural Network. arXiv preprint arXiv:2206.12747 (2022).
- Drug Abuse Detection in Twitter-sphere: Graph-Based Approach. Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 (2021), 4136–4145. https://doi.org/10.1109/BIGDATA52589.2021.9671532
- Protein homology detection using string alignment kernels. Bioinformatics (Oxford, England) 20 (7 2004), 1682–1689. Issue 11. https://doi.org/10.1093/BIOINFORMATICS/BTH141
- Patrick Brendan Timmons and Chandralal M. Hewage. 2021. ENNAACT is a novel tool which employs neural networks for anticancer activity classification for therapeutic peptides. Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie 133 (1 2021). https://doi.org/10.1016/J.BIOPHA.2020.111051
- Deep learning methods in protein structure prediction. Computational and Structural Biotechnology Journal 18 (1 2020), 1301–1310. https://doi.org/10.1016/J.CSBJ.2019.12.011
- Integrating long-range connectivity information into de Bruijn graphs. ([n. d.]). https://doi.org/10.1093/bioinformatics/bty157
- UCI. [n. d.]. UCI Machine Learning Repository: Data Sets. https://archive.ics.uci.edu/ml/datasets.php?format=&task=&att=&area=&numAtt=&numIns=&type=seq&sort=nameUp&view=table
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
- Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. (9 2019). https://doi.org/10.48550/arxiv.1909.01315
- Anti-cancer peptides: classification, mechanism of action, reconstruction and modification. Open Biology 10 (7 2020). Issue 7. https://doi.org/10.1098/RSOB.200004
- Graph Convolutional Networks for Text Classification. ([n. d.]). www.aaai.org
- Mining k-mers of various lengths in biological sequences. In Bioinformatics Research and Applications: 13th International Symposium, ISBRA 2017, Honolulu, HI, USA, May 29–June 2, 2017, Proceedings 13. Springer, 186–195.
- Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks. ([n. d.]). https://github.com/CRIPAC-DIG/TextING
- Jaroslaw Zola. 2014. Constructing similarity graphs from large-scale biological sequence collections. In 2014 IEEE International Parallel & Distributed Processing Symposium Workshops. IEEE, 500–507.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.