Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Query Classification in E-commerce Search (2403.06021v1)

Published 9 Mar 2024 in cs.IR and cs.LG

Abstract: E-commerce platforms typically store and structure product information and search data in a hierarchy. Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research. The significance of this task is amplified when dealing with sensitive query categorization or critical information dissemination, where inaccuracies can lead to considerable negative impacts. The inherent complexity of hierarchical query classification is compounded by two primary challenges: (1) the pronounced class imbalance that skews towards dominant categories, and (2) the inherent brevity and ambiguity of search queries that hinder accurate classification. To address these challenges, we introduce a novel framework that leverages hierarchical information through (i) enhanced representation learning that utilizes the contrastive loss to discern fine-grained instance relationships within the hierarchy, called ''instance hierarchy'', and (ii) a nuanced hierarchical classification loss that attends to the intrinsic label taxonomy, named ''label hierarchy''. Additionally, based on our observation that certain unlabeled queries share typographical similarities with labeled queries, we propose a neighborhood-aware sampling technique to intelligently select these unlabeled queries to boost the classification performance. Extensive experiments demonstrate that our proposed method is better than state-of-the-art (SOTA) on the proprietary Amazon dataset, and comparable to SOTA on the public datasets of Web of Science and RCV1-V2. These results underscore the efficacy of our proposed solution, and pave the path toward the next generation of hierarchy-aware query classification systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Self-training: A survey. arXiv preprint arXiv:2202.12040 (2022).
  2. Hierarchical transfer learning for multi-label text classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 6295–6300.
  3. Improving automatic query classification via semi-supervised learning. In Fifth IEEE International Conference on Data Mining (ICDM’05). IEEE, 8–pp.
  4. Levenshtein distance, sequence comparison and biological database search. IEEE transactions on information theory 67, 6 (2020), 3287–3294.
  5. Context-aware query classification. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 3–10.
  6. N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, Vol. 161175. Las Vegas, NV, 14.
  7. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.
  8. Shui-Lung Chuang and Lee-Feng Chien. 2002. Towards automatic generation of query taxonomy: A hierarchical query clustering approach. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings. IEEE, 75–82.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  10. Hotflip: White-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017).
  11. Rand Fishkin. 2017. The State of Searcher Behavior Revealed Through 23 Remarkable Statistics. https://moz.com/blog/state-of-searcher-behavior-revealed
  12. Petgen: Personalized text generation attack on deep sequence embedding-based classification models. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 575–584.
  13. Racism is a virus: Anti-Asian hate and counterspeech in social media during the COVID-19 crisis. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 90–94.
  14. HierCat: Hierarchical Query Categorization from Weakly Supervised Data at Facebook Marketplace. In Companion Proceedings of the ACM Web Conference 2023. 331–335.
  15. A survey on contrastive self-supervised learning. Technologies 9, 1 (2020), 2.
  16. Building effective query classifiers: a case study in self-harm intent detection. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 1735–1738.
  17. Scalable semi-supervised query classification using matrix sketching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 8–13.
  18. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  19. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
  20. Hdltex: Hierarchical deep learning for text classification. In 2017 16th IEEE international conference on machine learning and applications (ICMLA). IEEE, 364–371.
  21. Contrastive representation learning: A framework and review. Ieee Access 8 (2020), 193907–193934.
  22. Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research 5, Apr (2004), 361–397.
  23. System Design of Extreme Multi-label Query Classification using a Hybrid Model.. In eCOM@ SIGIR.
  24. Rushi Longadge and Snehalata Dongre. 2013. Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707 (2013).
  25. Label-specific dual graph neural network for multi-label text classification. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 3855–3864.
  26. Characterizing and Predicting Social Correction on Twitter. In Proceedings of the 15th ACM Web Science Conference 2023. 86–95.
  27. Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.
  28. The role of the crowd in countering misinformation: A case study of the COVID-19 infodemic. In 2020 IEEE international Conference on big data (big data). IEEE, 748–757.
  29. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
  30. Toward hierarchical classification of imbalanced data using random resampling algorithms. Information Sciences 578 (2021), 344–363.
  31. Product query classification. In Proceedings of the 18th ACM conference on information and knowledge management. 741–750.
  32. Semi-supervised self-training for decision tree classifiers. International Journal of Machine Learning and Cybernetics 8 (2017), 355–370.
  33. Cost-sensitive learning methods for imbalanced data. In The 2010 International joint conference on neural networks (IJCNN). IEEE, 1–8.
  34. Jesper E Van Engelen and Holger H Hoos. 2020. A survey on semi-supervised learning. Machine learning 109, 2 (2020), 373–440.
  35. Attention is all you need. Advances in neural information processing systems 30 (2017).
  36. Incorporating hierarchy into text encoder: a contrastive learning approach for hierarchical text classification. arXiv preprint arXiv:2203.03825 (2022).
  37. HPT: Hierarchy-aware prompt tuning for hierarchical text classification. arXiv preprint arXiv:2204.13413 (2022).
  38. Zero-shot text classification via reinforced self-training. In Proceedings of the 58th annual meeting of the association for computational linguistics. 3014–3024.
  39. Graph neural networks: A review of methods and applications. AI open 1 (2020), 57–81.
  40. Hierarchy-aware global model for hierarchical text classification. In Proceedings of the 58th annual meeting of the association for computational linguistics. 1106–1117.
  41. The survey of large-scale query classification. In AIP conference proceedings, Vol. 1834. AIP Publishing.
  42. HCL4QC: Incorporating Hierarchical Category Structures Into Contrastive Learning for E-commerce Query Classification. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 3647–3656.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Bing He (82 papers)
  2. Sreyashi Nag (16 papers)
  3. Limeng Cui (19 papers)
  4. Suhang Wang (118 papers)
  5. Zheng Li (326 papers)
  6. Rahul Goutam (6 papers)
  7. Zhen Li (334 papers)
  8. Haiyang Zhang (56 papers)