Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text-Based Product Matching -- Semi-Supervised Clustering Approach (2402.10091v1)

Published 1 Feb 2024 in cs.DB, cs.AI, and cs.LG

Abstract: Matching identical products present in multiple product feeds constitutes a crucial element of many tasks of e-commerce, such as comparing product offerings, dynamic price optimization, and selecting the assortment personalized for the client. It corresponds to the well-known machine learning task of entity matching, with its own specificity, like omnipresent unstructured data or inaccurate and inconsistent product descriptions. This paper aims to present a new philosophy to product matching utilizing a semi-supervised clustering approach. We study the properties of this method by experimenting with the IDEC algorithm on the real-world dataset using predominantly textual features and fuzzy string matching, with more standard approaches as a point of reference. Encouraging results show that unsupervised matching, enriched with a small annotated sample of product links, could be a possible alternative to the dominant supervised strategy, requiring extensive manual data labeling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Eurostat, “E-commerce sales,” isoc_ec_eseln2 dataset, Eurostat, September 2023.
  2. Eurostat, “E-commerce continues to grow in the eu,” tech. rep., Eurostat, Spetember 2023.
  3. Statista, “E-commerce as share of total U.S. retail sales from 1st quarter 2010 to 3rd quarter 2021,” dataset, Statista, September 2023.
  4. D. Shankar, S. Narumanchi, H. Ananya, P. Kompalli, and K. Chaudhury, “Deep learning based large scale visual recommendation and search for e-commerce,” arXiv preprint arXiv:1703.02344, 2017.
  5. R. Gubela, A. Bequé, S. Lessmann, and F. Gebert, “Conversion uplift in e-commerce: A systematic benchmark of modeling strategies,” International Journal of Information Technology & Decision Making, vol. 18, no. 03, pp. 747–791, 2019.
  6. L. Zhou, “Product advertising recommendation in e-commerce based on deep learning and distributed expression,” Electronic Commerce Research, vol. 20, no. 2, pp. 321–342, 2020.
  7. R. Gupta and C. Pathak, “A machine learning framework for predicting purchase by online customers based on dynamic pricing,” Procedia Computer Science, vol. 36, pp. 599–605, 2014.
  8. Y. Narahari, C. Raju, K. Ravikumar, and S. Shah, “Dynamic pricing models for electronic business,” sadhana, vol. 30, no. 2, pp. 231–256, 2005.
  9. R. Maestre, J. Duque, A. Rubio, and J. Arévalo, “Reinforcement learning for fair dynamic pricing,” in Proceedings of SAI Intelligent Systems Conference, pp. 120–135, Springer, 2018.
  10. J. Li, T. Wang, Z. Chen, G. Luo, et al., “Machine learning algorithm generated sales prediction for inventory optimization in cross-border e-commerce,” International Journal of Frontiers in Engineering Technology, vol. 1, no. 1, 2019.
  11. K. Sohn, D. Berthelot, C.-L. Li, Z. Zhang, N. Carlini, E. D. Cubuk, A. Kurakin, H. Zhang, and C. Raffel, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” arXiv preprint arXiv:2001.07685, 2020.
  12. S. Łukasik, A. Michałowski, P. A. Kowalski, and A. H. Gandomi, “Text-based product matching with incomplete and inconsistent items descriptions,” in International Conference on Computational Science, pp. 92–103, Springer, 2021.
  13. J. Tracz, P. I. Wójcik, K. Jasinska-Kobus, R. Belluzzo, R. Mroczkowski, and I. Gawlik, “Bert-based similarity learning for product matching,” in Proceedings of Workshop on Natural Language Processing in E-Commerce, pp. 66–75, 2020.
  14. R. Peeters, C. Bizer, and G. Glavaš, “Intermediate training of bert for product matching,” small, vol. 745, no. 722, pp. 2–112, 2020.
  15. R. Peeters and C. Bizer, “Supervised contrastive learning for product matching,” in Companion Proceedings of the Web Conference 2022, ACM, apr 2022.
  16. Y. Li, J. Li, Y. Suhara, A. Doan, and W.-C. Tan, “Deep entity matching with pre-trained language models,” Proceedings of the VLDB Endowment, vol. 14, p. 50–60, Sept. 2020.
  17. J. Li, Z. Dou, Y. Zhu, X. Zuo, and J.-R. Wen, “Deep cross-platform product matching in e-commerce,” Information Retrieval Journal, vol. 23, no. 2, pp. 136–158, 2020.
  18. A. Alabdullatif and M. Aloud, “Araprodmatch: A machine learning approach for product matching in e-commerce,” International Journal of Computer Science & Network Security, vol. 21, no. 4, pp. 214–222, 2021.
  19. R. Peeters and C. Bizer, “Supervised contrastive learning for product matching,” arXiv preprint arXiv:2202.02098, 2022.
  20. K. Amshakala and R. Nedunchezhian, “Using fuzzy logic for product matching,” in Computational Intelligence, Cyber Security and Computational Models (G. S. S. Krishnan, R. Anitha, R. S. Lekshmi, M. S. Kumar, A. Bonato, and M. Graña, eds.), (New Delhi), pp. 171–179, Springer India, 2014.
  21. R. Peeters and C. Bizer, “Entity matching using large language models,” 2023.
  22. R. Peeters and C. Bizer, “Using chatgpt for entity matching,” 2023.
  23. K. Gupte, L. Pang, H. Vuyyuri, and S. Pasumarty, “Multimodal product matching and category mapping: Text+ image based deep neural network,” in 2021 IEEE International Conference on Big Data (Big Data), pp. 4500–4505, IEEE, 2021.
  24. M. Wilke and E. Rahm, “Towards multi-modal entity resolution for product matching.,” in GvDB, 2021.
  25. H. Tzaban, I. Guy, A. Greenstein-Messica, A. Dagan, L. Rokach, and B. Shapira, “Product bundle identification using semi-supervised learning,” in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, (New York, NY, USA), p. 791–800, Association for Computing Machinery, 2020.
  26. A. Primpeli, R. Peeters, and C. Bizer, “The wdc training dataset and gold standard for large-scale product matching,” in Companion Proceedings of The 2019 World Wide Web Conference, pp. 381–386, 2019.
  27. M. Okabe and S. Yamada, “Clustering using boosted constrained k-means algorithm,” Frontiers in Robotics and AI, vol. 5, 2018.
  28. H. Zhang, S. Basu, and I. Davidson, “A framework for deep constrained clustering-algorithms and advances,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 57–72, Springer, 2019.
  29. E. Bair, “Semi-supervised clustering methods,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 5, no. 5, pp. 349–361, 2013.
  30. N. Gali, R. Mariescu-Istodor, and P. Fränti, “Similarity measures for title matching,” in 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1548–1553, IEEE, 2016.
  31. L. Yujian and L. Bo, “A normalized levenshtein distance metric,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, pp. 1091–1095, June 2007.
  32. G. Ivchenko and S. Honov, “On the jaccard similarity test,” Journal of Mathematical Sciences, vol. 88, no. 6, pp. 789–794, 1998.
  33. K. Wagstaff, C. Cardie, S. Rogers, S. Schrödl, et al., “Constrained k-means clustering with background knowledge,” in Icml, vol. 1, pp. 577–584, 2001.
  34. CRAN, “conclust: Pairwise constraints clustering,” package, Apr 2022.
  35. X. Guo, L. Gao, X. Liu, and J. Yin, “Improved deep embedded clustering with local structure preservation.,” in Ijcai, pp. 1753–1759, 2017.
  36. H. Zhang, T. Zhan, S. Basu, and I. Davidson, “A framework for deep constrained clustering,” Data Mining and Knowledge Discovery, vol. 35, no. 2, pp. 593–620, 2021.
  37. J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” in International conference on machine learning, pp. 478–487, PMLR, 2016.
  38. L. A. Jeni, J. F. Cohn, and F. De La Torre, “Facing imbalanced data–recommendations for the use of performance metrics,” in 2013 Humaine association conference on affective computing and intelligent interaction, pp. 245–251, IEEE, 2013.
  39. J. M. Santos and M. Embrechts, “On the use of the adjusted rand index as a metric for evaluating supervised classification,” in International conference on artificial neural networks, pp. 175–184, Springer, 2009.
  40. Kaggle, “Skroutz dataset for product matching,” dataset, Apr 2022.
  41. G. A. Rao, G. Srinivas, K. V. Rao, and P. P. Reddy, “A partial ratio and ratio based fuzzy-wuzzy procedure for characteristic mining of mathematical formulas from documents,” IJSC—ICTACT J Soft Comput, vol. 8, no. 4, pp. 1728–1732, 2018.
  42. S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra, “Deep learning for entity matching: A design space exploration,” in Proceedings of the 2018 International Conference on Management of Data, SIGMOD ’18, (New York, NY, USA), p. 19–34, Association for Computing Machinery, 2018.
  43. A. Primpeli, R. Peeters, and C. Bizer, “The wdc training dataset and gold standard for large-scale product matching,” pp. 381–386, 05 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Alicja Martinek (1 paper)
  2. Szymon Łukasik (13 papers)
  3. Amir H. Gandomi (28 papers)

Summary

We haven't generated a summary for this paper yet.