Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Product Information Extraction using ChatGPT (2306.14921v1)

Published 23 Jun 2023 in cs.CL and cs.IR

Abstract: Structured product data in the form of attribute/value pairs is the foundation of many e-commerce applications such as faceted product search, product comparison, and product recommendation. Product offers often only contain textual descriptions of the product attributes in the form of titles or free text. Hence, extracting attribute/value pairs from textual product descriptions is an essential enabler for e-commerce applications. In order to excel, state-of-the-art product information extraction methods require large quantities of task-specific training data. The methods also struggle with generalizing to out-of-distribution attributes and attribute values that were not a part of the training data. Due to being pre-trained on huge amounts of text as well as due to emergent effects resulting from the model size, LLMs like ChatGPT have the potential to address both of these shortcomings. This paper explores the potential of ChatGPT for extracting attribute/value pairs from product descriptions. We experiment with different zero-shot and few-shot prompt designs. Our results show that ChatGPT achieves a performance similar to a pre-trained LLM but requires much smaller amounts of training data and computation for fine-tuning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs] (July 2020). arXiv: 2005.14165.
  2. PaLM: Scaling Language Modeling with Pathways. https://doi.org/10.48550/arXiv.2204.02311 arXiv:2204.02311 [cs].
  3. A Survey on In-context Learning. arXiv:2301.00234 [cs].
  4. DeBERTa: Decoding-enhanced BERT with Disentangled Attention. arXiv:2006.03654 [cs] (Jan. 2021). arXiv: 2006.03654.
  5. Diverse Demonstrations Improve In-context Compositional Generalization. arXiv:2212.06800 [cs].
  6. Generated Knowledge Prompting for Commonsense Reasoning. In ACL2022. 3154–3169.
  7. What Makes Good In-Context Examples for GPT-3?. In DeeLIO2022. 100–114.
  8. Can Foundation Models Wrangle Your Data?. In VLDB2022 (4, Vol. 16). 738–746.
  9. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In EMNLP2019. 188–197.
  10. Training language models to follow instructions with human feedback. In NeurIPS2022, Vol. 35. 27730–27744.
  11. Duangmanee Putthividhya and Junling Hu. 2011. Bootstrapped Named Entity Recognition for Product Attribute Extraction. In EMNLP2011. 1557–1567.
  12. Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach. In KDD2020. 47–55.
  13. A SURVEY OF FACETED SEARCH. Journal of Web Engineering (Nov. 2013), 041–064.
  14. Emergent Abilities of Large Language Models. arXiv:2206.07682 [cs].
  15. Scaling up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title. In ACL2019. 5214–5223.
  16. AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding. In ACL2021. 4694–4705.
  17. MAVE: A Product Dataset for Multi-source Attribute Value Extraction. In WSDM2022. Association for Computing Machinery, New York, NY, USA, 1256–1265.
  18. BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting. arXiv:2212.09535 [cs].
  19. OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision. In WWW2022. 3153–3161.
  20. Calibrate Before Use: Improving Few-shot Performance of Language Models. In ICML2021. 12697–12706.
  21. OpenTag: Open Attribute Value Extraction from Product Profiles. In KDD2018. 1049–1058.
  22. Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product. In EMNLP2020. 2129–2139.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Alexander Brinkmann (5 papers)
  2. Roee Shraga (20 papers)
  3. Reng Chiz Der (2 papers)
  4. Christian Bizer (15 papers)
Citations (6)