Text-Based Product Matching -- Semi-Supervised Clustering Approach (2402.10091v1)

Published 1 Feb 2024 in cs.DB, cs.AI, and cs.LG

Abstract: Matching identical products present in multiple product feeds constitutes a crucial element of many tasks of e-commerce, such as comparing product offerings, dynamic price optimization, and selecting the assortment personalized for the client. It corresponds to the well-known machine learning task of entity matching, with its own specificity, like omnipresent unstructured data or inaccurate and inconsistent product descriptions. This paper aims to present a new philosophy to product matching utilizing a semi-supervised clustering approach. We study the properties of this method by experimenting with the IDEC algorithm on the real-world dataset using predominantly textual features and fuzzy string matching, with more standard approaches as a point of reference. Encouraging results show that unsupervised matching, enriched with a small annotated sample of product links, could be a possible alternative to the dominant supervised strategy, requiring extensive manual data labeling.

References (43)

Authors (3)

Alicja Martinek (1 paper)
Szymon Łukasik (13 papers)
Amir H. Gandomi (28 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Text-Based Product Matching -- Semi-Supervised Clustering Approach (2402.10091v1)

Summary

Related Papers