A Framework for Leveraging Partially-Labeled Data for Product Attribute-Value Identification
Abstract: In the e-commerce domain, the accurate extraction of attribute-value pairs (e.g., Brand: Apple) from product titles and user search queries is crucial for enhancing search and recommendation systems. A major challenge with neural models for this task is the lack of high-quality training data, as the annotations for attribute-value pairs in the available datasets are often incomplete. To address this, we introduce GenToC, a model designed for training directly with partially-labeled data, eliminating the necessity for a fully annotated dataset. GenToC employs a marker-augmented generative model to identify potential attributes, followed by a token classification model that determines the associated values for each attribute. GenToC outperforms existing state-of-the-art models, exhibiting upto 56.3% increase in the number of accurate extractions. Furthermore, we utilize GenToC to regenerate the training dataset to expand attribute-value annotations. This bootstrapping substantially improves the data quality for training other standard NER models, which are typically faster but less capable in handling partially-labeled data, enabling them to achieve comparable performance to GenToC. Our results demonstrate GenToC's unique ability to learn from a limited set of partially-labeled data and improve the training of more efficient models, advancing the automated extraction of attribute-value pairs. Finally, our model has been successfully integrated into IndiaMART, India's largest B2B e-commerce platform, achieving a significant increase of 20.2% in the number of correctly identified attribute-value pairs over the existing deployed system while achieving a high precision of 89.5%.
- Unsupervised extraction of popular product attributes from web sites. In Information Retrieval Technology. Springer Berlin Heidelberg, 437–446.
- Autoregressive Entity Retrieval. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=5k8F6UU39V
- Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76 (1971), 378–382. https://api.semanticscholar.org/CorpusID:143544759
- Text mining for product attribute extraction. SIGKDD Explor. 8 (2006), 41–48.
- Matching product titles using web-based enrichment. Proceedings of the 21st ACM international conference on Information and knowledge management (2012).
- DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION. In International Conference on Learning Representations. https://openreview.net/forum?id=XPZIaotutsD
- Large Scale Generative Multimodal Attribute Extraction for E-commerce Attributes. ArXiv abs/2306.00379 (2023).
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. CoRR abs/1910.13461 (2019). arXiv:1910.13461 http://arxiv.org/abs/1910.13461
- David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30 (2007), 3–26.
- Semi-supervised learning of attribute-value pairs from product descriptions. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., 2838–2843.
- Duangmanee Putthividhya and Junling Hu. 2011. Bootstrapped named entity recognition for product attribute extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1557–1567.
- Accurate Product Attribute Extraction on the Field. 2019 IEEE 35th International Conference on Data Engineering (ICDE) (2019), 1862–1873.
- Thomas Ricatte and Donato Crisostomi. 2023. AVEN-GR: Attribute Value Extraction and Normalization using product GRaphs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), Sunayana Sitaram, Beata Beigman Klebanov, and Jason D Williams (Eds.). Association for Computational Linguistics, Toronto, Canada.
- Attribute Value Generation from Product Title using Language Models. Proceedings of The 4th Workshop on e-Commerce and NLP (2021).
- Exploring Generative Models for Joint Attribute Value Extraction from Product Titles. ArXiv abs/2208.07130 (2022).
- Keiji Shinzato and Satoshi Sekine. 2013. Unsupervised extraction of attributes and their values from product description. In Proceedings of the Sixth International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 1339–1347.
- Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction. ArXiv abs/2206.14264 (2022).
- A Unified Generative Approach to Product Attribute-Value Identification. ArXiv abs/2306.05605 (2023).
- NER-MQMRC: Formulating Named Entity Recognition as Multi Question Machine Reading Comprehension. ArXiv abs/2205.05904 (2022).
- MPKGAC: Multimodal Product Attribute Completion in E-commerce. Companion Proceedings of the ACM Web Conference 2023 (2023). https://api.semanticscholar.org/CorpusID:258377639
- Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020).
- Scalable Attribute-Value Extraction from Semi-structured Text. 2009 IEEE International Conference on Data Mining Workshops (2009), 302–307.
- Scaling up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title. In ACL.
- MAVE: A Product Dataset for Multi-source Attribute Value Extraction. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (2021).
- QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction. Proceedings of the 30th ACM International Conference on Information & Knowledge Management (2021). https://api.semanticscholar.org/CorpusID:237213565
- OpenTag: Open Attribute Value Extraction from Product Profiles. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018).
- Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product. In Conference on Empirical Methods in Natural Language Processing.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.