Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Convolutional Motif Kernel Networks (2111.02272v3)

Published 3 Nov 2021 in cs.LG and cs.AI

Abstract: Artificial neural networks show promising performance in detecting correlations within data that are associated with specific outcomes. However, the black-box nature of such models can hinder the knowledge advancement in research fields by obscuring the decision process and preventing scientist to fully conceptualize predicted outcomes. Furthermore, domain experts like healthcare providers need explainable predictions to assess whether a predicted outcome can be trusted in high stakes scenarios and to help them integrating a model into their own routine. Therefore, interpretable models play a crucial role for the incorporation of machine learning into high stakes scenarios like healthcare. In this paper we introduce Convolutional Motif Kernel Networks, a neural network architecture that involves learning a feature representation within a subspace of the reproducing kernel Hilbert space of the position-aware motif kernel function. The resulting model enables to directly interpret and evaluate prediction outcomes by providing a biologically and medically meaningful explanation without the need for additional post-hoc analysis. We show that our model is able to robustly learn on small datasets and reaches state-of-the-art performance on relevant healthcare prediction tasks. Our proposed method can be utilized on DNA and protein sequences. Furthermore, we show that the proposed method learns biologically meaningful concepts directly from data using an end-to-end learning scheme.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Predicting the sequence specificities of dna-and rna-binding proteins by deep learning. Nature biotechnology, 33(8):831–838.
  2. Splice site identification using probabilistic parameters and svm classification. In BMC bioinformatics, volume 7, pages 1–15. BioMed Central.
  3. Phenotypic protease inhibitor resistance and cross-resistance in the clinic from 2006 to 2008 and mutational prevalences in hiv from patients with discordant tipranavir and darunavir susceptibility phenotypes. AIDS research and human retroviruses, 28(9):1019–1024.
  4. Object recognition with hierarchical kernel descriptors. In CVPR 2011, pages 1729–1736. IEEE.
  5. Post-hoc explanations fail to achieve their purpose in adversarial contexts. arXiv preprint arXiv:2201.10295.
  6. Biological sequence modeling with convolutional kernel networks. Bioinformatics, 35(18):3294–3302.
  7. Recurrent kernel networks. In Advances in Neural Information Processing Systems, pages 13431–13442.
  8. Prediction of splice sites with dependency graphs and their expanded bayesian networks. Bioinformatics, 21(4):471–482.
  9. Chicco, D. (2017). Ten quick tips for machine learning in computational biology. BioData mining, 10(1):1–17.
  10. Kernel methods for deep learning. Advances in neural information processing systems, 22:342–350.
  11. Identification of i50l as the signature atazanavir (atv)-resistance mutation in treatment-naive hiv-1-infected patients receiving atv-containing regimens. Journal of Infectious Diseases, 189(10):1802–1810.
  12. Hiv protease genotype and viral sensitivity to hiv protease inhibitors following saquinavir therapy. Aids, 12(13):1611–1618.
  13. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9268–9277.
  14. Feature subset selection for splice site prediction. Bioinformatics, 18(suppl_2):S75–S83.
  15. geno2pheno [ngs-freq]: a genotypic interpretation system for identifying viral drug resistance using next-generation sequencing data. Nucleic acids research, 46(W1):W271–W277.
  16. Single-cell rna-seq denoising using a deep count autoencoder. Nature communications, 10(1):1–14.
  17. Mismatch string kernels for svm protein classification. In Advances in neural information processing systems, pages 1441–1448.
  18. Human immunodeficiency virus fitness in vivo: calculations based on a single zidovudine resistance mutation at codon 215 of reverse transcriptase. Journal of virology, 70(8):5662–5664.
  19. Saliency, scale and image description. International Journal of Computer Vision, 45(2):83–105.
  20. Identification of genotypic changes in human immunodeficiency virus protease that correlate with reduced susceptibility to the protease inhibitor lopinavir among viral isolates from protease inhibitor-experienced patients. Journal of Virology, 75(16):7462–7469.
  21. Potential mechanism for sustained antiretroviral efficacy of azt-3tc combination therapy. Science, 269(5224):696–699.
  22. Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research, 5(Nov):1435–1455.
  23. Lipton, Z. C. (2018). The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3):31–57.
  24. A unified approach to interpreting model predictions. In Proceedings of the 31st international conference on neural information processing systems, pages 4768–4777.
  25. Mairal, J. (2016). End-to-end kernel learning with supervised convolutional kernel networks. In Advances in neural information processing systems, pages 1399–1407.
  26. Convolutional kernel networks. In Advances in neural information processing systems, pages 2627–2635.
  27. Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2):442–451.
  28. Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites. BMC bioinformatics, 5(1):169.
  29. A trainable optimal transport embedding for feature aggregation and its relationship to attention. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  30. The ethics of ai in health care: a mapping review. Social Science & Medicine, 260:113172.
  31. Random forest algorithm for prediction of hiv drug resistance. In Pattern Recognition Techniques Applied to Biomedical Problems, pages 109–127. Springer.
  32. Rase: recognition of alternatively spliced exons in c.elegans. Bioinformatics, 21(suppl_1):369–377.
  33. Improved splice site detection in genie. Journal of computational biology, 4(3):311–323.
  34. Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic acids research, 31(1):298–303.
  35. Hiv-1 protease mutations and protease inhibitor cross-resistance. Antimicrobial agents and chemotherapy, 54(10):4253–4261.
  36. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215.
  37. The effect of high-dose saquinavir on viral load and cd4+ t-cell counts in hiv-infected patients. Annals of Internal Medicine, 124(12):1039–1050.
  38. Shafer, R. W. (2006). Rationale and uses of a public hiv drug-resistance database. The Journal of infectious diseases, 194(Supplement_1):S51–S58.
  39. When explanations lie: Why many modified bp attributions fail. In International Conference on Machine Learning, pages 9046–9057. PMLR.
  40. Accurate splice site prediction using support vector machines. In BMC bioinformatics, volume 8, pages 1–16. Springer.
  41. Drug resistance prediction using deep learning techniques on hiv-1 sequence data. Viruses, 12(5):560.
  42. The impact of individual human immunodeficiency virus type 1 protease mutations on drug susceptibility is highly influenced by complex interactions with the background protease sequence. Journal of virology, 83(18):9512–9520.
  43. Using the nyström method to speed up kernel machines. In Proceedings of the 14th annual conference on neural information processing systems, number CONF, pages 682–688.
  44. Predba: A heterogeneous ensemble approach for predicting protein-dna binding affinity. Scientific Reports, 10(1):1–11.
  45. Improved nyström low-rank approximation and error analysis. In Proceedings of the 25th international conference on Machine learning, pages 1232–1239.
  46. Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics, 16(9):799–807.
  47. Hiv-1 genotypic resistance patterns predict response to saquinavir–ritonavir therapy in patients in whom previous protease inhibitor therapy had failed. Annals of internal medicine, 131(11):813–821.
  48. Splicerover: interpretable convolutional neural networks for improved splice site prediction. Bioinformatics, 34(24):4180–4188.
Citations (2)

Summary

We haven't generated a summary for this paper yet.