Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 37 tok/s
GPT-5 High 35 tok/s Pro
GPT-4o 105 tok/s
GPT OSS 120B 463 tok/s Pro
Kimi K2 235 tok/s Pro
2000 character limit reached

Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification (2310.10321v2)

Published 16 Oct 2023 in cs.LG

Abstract: Sequence classification has numerous applications in various fields. Despite extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. Existing pattern-based methods measure the discriminative power of each feature individually during the mining process, leading to the result of missing some combinations of features with discriminative power. Furthermore, it is difficult to ensure the overall discriminative performance after converting sequences into feature vectors. To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. W.-F. Zeng, X.-X. Zhou, S. Willems, C. Ammar, M. Wahle, I. Bludau, E. Voytik, M. T. Strauss, and M. Mann, “Alphapeptdeep: a modular deep learning framework to predict peptide properties for proteomics,” Nature Communications, vol. 13, no. 1, p. 7238, 2022.
  2. H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, “Text classification using string kernels,” Journal of Machine Learning Research, vol. 2, no. Feb, pp. 419–444, 2002.
  3. A. Gupta, V. Dengre, H. A. Kheruwala, and M. Shah, “Comprehensive review of text-mining applications in finance,” Financial Innovation, vol. 6, pp. 1–25, 2020.
  4. Z. Xing, J. Pei, and E. Keogh, “A brief survey on sequence classification,” SIGKDD Explorations Newsletter, vol. 12, no. 1, pp. 40–48, 2010.
  5. Z. He, S. Zhang, and J. Wu, “Significance-based discriminative sequential pattern mining,” Expert Systems with Applications, vol. 122, pp. 54–64, 2019.
  6. J. De Smedt, G. Deeva, and J. De Weerdt, “Mining Behavioral Sequence Constraints for Classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 6, pp. 1130–1142, 2020.
  7. C. Zhou, B. Cule, and B. Goethals, “Pattern Based Sequence Classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 5, pp. 1285–1298, 2016.
  8. E. Egho, D. Gay, M. Boullé, N. Voisine, and F. Clérot, “A user parameter-free approach for mining robust sequential classification rules,” Knowledge and Information Systems, vol. 52, no. 1, pp. 53–81, 2017.
  9. D. Nguyen, W. Luo, T. D. Nguyen, S. Venkatesh, and D. Phung, “Sqn2vec: Learning sequence representation via sequential patterns with a gap constraint,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases.   Springer, 2018, pp. 569–584.
  10. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in Proceedings of the 2016 European Conference on Computer Vision.   Springer, 2016, pp. 525–542.
  11. N. A. Chuzhanova, A. J. Jones, and S. Margetts, “Feature selection for genetic sequence classification.” Bioinformatics, vol. 14, no. 2, 1998.
  12. N. Lesh, M. J. Zaki, and M. Ogihara, “Mining features for sequence classification,” in Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp. 342–346.
  13. X. Ji, J. Bailey, and G. Dong, “Mining minimal distinguishing subsequence patterns with gap constraints,” Knowledge and Information Systems, vol. 11, no. 3, pp. 259–286, 2007.
  14. J. Pei, J. Han, and W. Wang, “Constraint-based sequential pattern mining: the pattern-growth methods,” Journal of Intelligent Information Systems, vol. 28, no. 2, pp. 133–160, 2007.
  15. R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance improvements,” in Proceedings of the Advances in Database Technology—EDBT’96: 5th International Conference on Extending Database Technology.   Springer, 1996, pp. 1–17.
  16. T. P. Exarchos, M. G. Tsipouras, C. Papaloukas, and D. I. Fotiadis, “A two-stage methodology for sequence classification based on sequential pattern mining and optimization,” Data & Knowledge Engineering, vol. 66, no. 3, pp. 467–487, 2008.
  17. D. Lo, H. Cheng, J. Han, S.-C. Khoo, and C. Sun, “Classification of software behaviors for failure detection: a discriminative pattern mining approach,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, pp. 557–566.
  18. A. N. Ntagiou, M. G. Tsipouras, N. Giannakeas, and A. T. Tzallas, “Protein structure recognition by means of sequential pattern mining,” in Proceedings of the 17th International Conference on Bioinformatics and Bioengineering (BIBE).   IEEE, 2017, pp. 334–339.
  19. C.-Y. Tsai and C.-J. Chen, “A pso-ab classifier for solving sequence classification problems,” Applied Soft Computing, vol. 27, pp. 11–27, 2015.
  20. Z. He, Z. Wu, G. Xu, Y. Liu, and Q. Zou, “Decision tree for sequences,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 1, pp. 251–263, 2023.
  21. G. Ifrim and C. Wiuf, “Bounded coordinate-descent for biological sequence classification in high dimensional predictor space,” in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, pp. 708–716.
  22. G. Ifrim, G. Bakir, and G. Weikum, “Fast logistic regression for text categorization with variable-length n-grams,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 354–362.
  23. D. Okanohara and J. Tsujii, “Text categorization with all substring features,” in Proceedings of the 2009 SIAM International Conference on Data Mining.   SIAM, 2009, pp. 838–846.
  24. Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep convolutional networks using vector quantization,” arXiv preprint arXiv:1412.6115, 2014.
  25. J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized convolutional neural networks for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820–4828.
  26. S. Chen, W. Wang, and S. J. Pan, “Metaquant: learning to quantize by learning to penetrate non-differentiable quantization,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, pp. 3916–3926.
  27. M. Nagel, M. Fournarakis, Y. Bondarenko, and T. Blankevoort, “Overcoming oscillations in quantization-aware training,” in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162.   PMLR, 2022, pp. 16 318–16 330.
  28. J. Lee, D. Kim, and B. Ham, “Network quantization with element-wise gradient scaling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6448–6457.
  29. S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” arXiv preprint arXiv:1606.06160, 2016.
  30. A. Bulat, B. Martinez, and G. Tzimiropoulos, “Bats: Binary architecture search,” in Proceedings of the 2020 European Conference on Computer Vision.   Springer, 2020, pp. 309–325.
  31. A. Bulat and G. Tzimiropoulos, “Xnor-net++: Improved binary neural networks,” in Proceedings of the British Machine Vision Conference (BMVC).   BMVA Press, September 2019, pp. 15.1–15.12.
  32. Z. Tu, X. Chen, P. Ren, and Y. Wang, “AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets,” in Proceedings of the 2022 European Conference on Computer Vision.   Springer Nature Switzerland, 2022, vol. 13671, pp. 379–395.
  33. Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013.
  34. Z. Wang, W. Zhang, L. Ning, and J. Wang, “Transparent classification with multilayer logical perceptrons and random binarization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 6331–6339.
  35. L. Qiao, W. Wang, and B. Lin, “Learning accurate and interpretable decision rule sets from neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 5, 2021, pp. 4303–4311.
  36. Z. Wang, W. Zhang, N. Liu, and J. Wang, “Scalable rule-based representation learning for interpretable classification,” Advances in Neural Information Processing Systems, vol. 34, pp. 30 479–30 491, 2021.
  37. W. Wang, L. Qiao, and B. Lin, “Tabular machine learning using conjunctive threshold neural networks,” Machine Learning with Applications, vol. 10, p. 100429, 2022.
  38. J. Fischer and J. Vreeken, “Differentiable Pattern Set Mining,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.   ACM, 2021, pp. 383–392.
  39. S. Itkar and U. Kulkarni, “Efficient frequent pattern mining using auto-associative memory neural network,” British Journal of Applied Science & Technology, vol. 4, no. 22, p. 3160, 2014.
  40. Y. Dong, X. Tai, and J. Zhao, “A distributed algorithm based on competitive neural network for mining frequent patterns,” in Proceedings of the 2005 International Conference on Neural Networks and Brain, vol. 1, 2005, pp. 499–503.
  41. S. Kamruzzaman and A. Jehad Sarkar, “A new data mining scheme using artificial neural networks,” Sensors, vol. 11, pp. 4622–4647, 2011.
  42. V. Baez-Monroy and S. O’keefe, “The identification and extraction of itemset support defined by the weight matrix of a self-organising map,” in Proceedings of the 2006 IEEE International Joint Conference on Neural Network.   IEEE, 2006, pp. 3518–3525.
  43. A. Jamshed, B. Mallick, and P. Kumar, “Deep learning-based sequential pattern mining for progressive database,” Soft Computing, vol. 24, pp. 17 233–17 246, 2020.
  44. J. Nowak, M. Korytkowski, and R. Scherer, “Discovering sequential patterns by neural networks,” in Proceedings of the 2020 International Joint Conference on Neural Networks, 2020, pp. 1–6.
  45. L. Jiang and N. Bosch, “Predictive sequential pattern mining via interpretable convolutional neural networks,” in Proceedings of the 14th International Conference on Educational Data Mining.   International Educational Data Mining Society, 2021.
  46. M. Collery, P. Bonnard, F. Fages, and R. Kusters, “Neural-based classification rule learning for sequential data,” in Proceedings of the Eleventh International Conference on Learning Representations, 2023.
  47. C. Di Ciccio and M. Mecella, “A two-step fast algorithm for the automated discovery of declarative workflows,” in Proceedings of the 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).   IEEE, 2013, pp. 135–142.
  48. F. M. Maggi, R. J. C. Bose, and W. M. van der Aalst, “Efficient discovery of understandable declarative process models from event logs,” in Proceedings of 24th International Conference on Advanced Information Systems Engineering.   Springer, 2012, pp. 270–285.
  49. F. M. Maggi, M. Montali, C. Di Ciccio, and J. Mendling, “Semantical vacuity detection in declarative process mining,” in Proceedings of the 14th International Conference on Business Process Management.   Springer, 2016, pp. 158–175.
  50. J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM Computing Surveys, vol. 46, no. 4, pp. 1–37, 2014.
  51. H. T. Lam, F. Mörchen, D. Fradkin, and T. Calders, “Mining compressing sequential patterns,” Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 7, no. 1, pp. 34–52, 2014.
  52. J. Zhang, Y. Wang, and D. Yang, “Ccspan: Mining closed contiguous sequential patterns,” Knowledge-Based Systems, vol. 89, pp. 1–13, 2015.
  53. C. Ranjan, S. Ebrahimi, and K. Paynabar, “Sequence graph transform (sgt): a feature embedding function for sequence data mining,” Data Mining and Knowledge Discovery, vol. 36, no. 2, pp. 668–708, 2022.
  54. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
  55. J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.