Text-Based Product Matching -- Semi-Supervised Clustering Approach
Abstract: Matching identical products present in multiple product feeds constitutes a crucial element of many tasks of e-commerce, such as comparing product offerings, dynamic price optimization, and selecting the assortment personalized for the client. It corresponds to the well-known machine learning task of entity matching, with its own specificity, like omnipresent unstructured data or inaccurate and inconsistent product descriptions. This paper aims to present a new philosophy to product matching utilizing a semi-supervised clustering approach. We study the properties of this method by experimenting with the IDEC algorithm on the real-world dataset using predominantly textual features and fuzzy string matching, with more standard approaches as a point of reference. Encouraging results show that unsupervised matching, enriched with a small annotated sample of product links, could be a possible alternative to the dominant supervised strategy, requiring extensive manual data labeling.
- Eurostat, âE-commerce sales,â isoc_ec_eseln2 dataset, Eurostat, September 2023.
- Eurostat, âE-commerce continues to grow in the eu,â tech. rep., Eurostat, Spetember 2023.
- Statista, âE-commerce as share of total U.S. retail sales from 1st quarter 2010 to 3rd quarter 2021,â dataset, Statista, September 2023.
- D. Shankar, S. Narumanchi, H. Ananya, P. Kompalli, and K. Chaudhury, âDeep learning based large scale visual recommendation and search for e-commerce,â arXiv preprint arXiv:1703.02344, 2017.
- R. Gubela, A. BequĂŠ, S. Lessmann, and F. Gebert, âConversion uplift in e-commerce: A systematic benchmark of modeling strategies,â International Journal of Information Technology & Decision Making, vol. 18, no. 03, pp. 747â791, 2019.
- L. Zhou, âProduct advertising recommendation in e-commerce based on deep learning and distributed expression,â Electronic Commerce Research, vol. 20, no. 2, pp. 321â342, 2020.
- R. Gupta and C. Pathak, âA machine learning framework for predicting purchase by online customers based on dynamic pricing,â Procedia Computer Science, vol. 36, pp. 599â605, 2014.
- Y. Narahari, C. Raju, K. Ravikumar, and S. Shah, âDynamic pricing models for electronic business,â sadhana, vol. 30, no. 2, pp. 231â256, 2005.
- R. Maestre, J. Duque, A. Rubio, and J. ArĂŠvalo, âReinforcement learning for fair dynamic pricing,â in Proceedings of SAI Intelligent Systems Conference, pp. 120â135, Springer, 2018.
- J. Li, T. Wang, Z. Chen, G. Luo, et al., âMachine learning algorithm generated sales prediction for inventory optimization in cross-border e-commerce,â International Journal of Frontiers in Engineering Technology, vol. 1, no. 1, 2019.
- K. Sohn, D. Berthelot, C.-L. Li, Z. Zhang, N. Carlini, E. D. Cubuk, A. Kurakin, H. Zhang, and C. Raffel, âFixmatch: Simplifying semi-supervised learning with consistency and confidence,â arXiv preprint arXiv:2001.07685, 2020.
- S. Ĺukasik, A. MichaĹowski, P. A. Kowalski, and A. H. Gandomi, âText-based product matching with incomplete and inconsistent items descriptions,â in International Conference on Computational Science, pp. 92â103, Springer, 2021.
- J. Tracz, P. I. WĂłjcik, K. Jasinska-Kobus, R. Belluzzo, R. Mroczkowski, and I. Gawlik, âBert-based similarity learning for product matching,â in Proceedings of Workshop on Natural Language Processing in E-Commerce, pp. 66â75, 2020.
- R. Peeters, C. Bizer, and G. GlavaĹĄ, âIntermediate training of bert for product matching,â small, vol. 745, no. 722, pp. 2â112, 2020.
- R. Peeters and C. Bizer, âSupervised contrastive learning for product matching,â in Companion Proceedings of the Web Conference 2022, ACM, apr 2022.
- Y. Li, J. Li, Y. Suhara, A. Doan, and W.-C. Tan, âDeep entity matching with pre-trained language models,â Proceedings of the VLDB Endowment, vol. 14, p. 50â60, Sept. 2020.
- J. Li, Z. Dou, Y. Zhu, X. Zuo, and J.-R. Wen, âDeep cross-platform product matching in e-commerce,â Information Retrieval Journal, vol. 23, no. 2, pp. 136â158, 2020.
- A. Alabdullatif and M. Aloud, âAraprodmatch: A machine learning approach for product matching in e-commerce,â International Journal of Computer Science & Network Security, vol. 21, no. 4, pp. 214â222, 2021.
- R. Peeters and C. Bizer, âSupervised contrastive learning for product matching,â arXiv preprint arXiv:2202.02098, 2022.
- K. Amshakala and R. Nedunchezhian, âUsing fuzzy logic for product matching,â in Computational Intelligence, Cyber Security and Computational Models (G. S. S. Krishnan, R. Anitha, R. S. Lekshmi, M. S. Kumar, A. Bonato, and M. GraĂąa, eds.), (New Delhi), pp. 171â179, Springer India, 2014.
- R. Peeters and C. Bizer, âEntity matching using large language models,â 2023.
- R. Peeters and C. Bizer, âUsing chatgpt for entity matching,â 2023.
- K. Gupte, L. Pang, H. Vuyyuri, and S. Pasumarty, âMultimodal product matching and category mapping: Text+ image based deep neural network,â in 2021 IEEE International Conference on Big Data (Big Data), pp. 4500â4505, IEEE, 2021.
- M. Wilke and E. Rahm, âTowards multi-modal entity resolution for product matching.,â in GvDB, 2021.
- H. Tzaban, I. Guy, A. Greenstein-Messica, A. Dagan, L. Rokach, and B. Shapira, âProduct bundle identification using semi-supervised learning,â in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR â20, (New York, NY, USA), p. 791â800, Association for Computing Machinery, 2020.
- A. Primpeli, R. Peeters, and C. Bizer, âThe wdc training dataset and gold standard for large-scale product matching,â in Companion Proceedings of The 2019 World Wide Web Conference, pp. 381â386, 2019.
- M. Okabe and S. Yamada, âClustering using boosted constrained k-means algorithm,â Frontiers in Robotics and AI, vol. 5, 2018.
- H. Zhang, S. Basu, and I. Davidson, âA framework for deep constrained clustering-algorithms and advances,â in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 57â72, Springer, 2019.
- E. Bair, âSemi-supervised clustering methods,â Wiley Interdisciplinary Reviews: Computational Statistics, vol. 5, no. 5, pp. 349â361, 2013.
- N. Gali, R. Mariescu-Istodor, and P. Fränti, âSimilarity measures for title matching,â in 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1548â1553, IEEE, 2016.
- L. Yujian and L. Bo, âA normalized levenshtein distance metric,â IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, pp. 1091â1095, June 2007.
- G. Ivchenko and S. Honov, âOn the jaccard similarity test,â Journal of Mathematical Sciences, vol. 88, no. 6, pp. 789â794, 1998.
- K. Wagstaff, C. Cardie, S. Rogers, S. SchrĂśdl, et al., âConstrained k-means clustering with background knowledge,â in Icml, vol. 1, pp. 577â584, 2001.
- CRAN, âconclust: Pairwise constraints clustering,â package, Apr 2022.
- X. Guo, L. Gao, X. Liu, and J. Yin, âImproved deep embedded clustering with local structure preservation.,â in Ijcai, pp. 1753â1759, 2017.
- H. Zhang, T. Zhan, S. Basu, and I. Davidson, âA framework for deep constrained clustering,â Data Mining and Knowledge Discovery, vol. 35, no. 2, pp. 593â620, 2021.
- J. Xie, R. Girshick, and A. Farhadi, âUnsupervised deep embedding for clustering analysis,â in International conference on machine learning, pp. 478â487, PMLR, 2016.
- L. A. Jeni, J. F. Cohn, and F. De La Torre, âFacing imbalanced dataârecommendations for the use of performance metrics,â in 2013 Humaine association conference on affective computing and intelligent interaction, pp. 245â251, IEEE, 2013.
- J. M. Santos and M. Embrechts, âOn the use of the adjusted rand index as a metric for evaluating supervised classification,â in International conference on artificial neural networks, pp. 175â184, Springer, 2009.
- Kaggle, âSkroutz dataset for product matching,â dataset, Apr 2022.
- G. A. Rao, G. Srinivas, K. V. Rao, and P. P. Reddy, âA partial ratio and ratio based fuzzy-wuzzy procedure for characteristic mining of mathematical formulas from documents,â IJSCâICTACT J Soft Comput, vol. 8, no. 4, pp. 1728â1732, 2018.
- S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra, âDeep learning for entity matching: A design space exploration,â in Proceedings of the 2018 International Conference on Management of Data, SIGMOD â18, (New York, NY, USA), p. 19â34, Association for Computing Machinery, 2018.
- A. Primpeli, R. Peeters, and C. Bizer, âThe wdc training dataset and gold standard for large-scale product matching,â pp. 381â386, 05 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.