Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PAE: LLM-based Product Attribute Extraction for E-Commerce Fashion Trends (2405.17533v1)

Published 27 May 2024 in cs.AI, cs.CV, and cs.LG

Abstract: Product attribute extraction is an growing field in e-commerce business, with several applications including product ranking, product recommendation, future assortment planning and improving online shopping customer experiences. Understanding the customer needs is critical part of online business, specifically fashion products. Retailers uses assortment planning to determine the mix of products to offer in each store and channel, stay responsive to market dynamics and to manage inventory and catalogs. The goal is to offer the right styles, in the right sizes and colors, through the right channels. When shoppers find products that meet their needs and desires, they are more likely to return for future purchases, fostering customer loyalty. Product attributes are a key factor in assortment planning. In this paper we present PAE, a product attribute extraction algorithm for future trend reports consisting text and images in PDF format. Most existing methods focus on attribute extraction from titles or product descriptions or utilize visual information from existing product images. Compared to the prior works, our work focuses on attribute extraction from PDF files where upcoming fashion trends are explained. This work proposes a more comprehensive framework that fully utilizes the different modalities for attribute extraction and help retailers to plan the assortment in advance. Our contributions are three-fold: (a) We develop PAE, an efficient framework to extract attributes from unstructured data (text and images); (b) We provide catalog matching methodology based on BERT representations to discover the existing attributes using upcoming attribute values; (c) We conduct extensive experiments with several baselines and show that PAE is an effective, flexible and on par or superior (avg 92.5% F1-Score) framework to existing state-of-the-art for attribute value extraction task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Edouard Belval. pdf2image. https://pypi.org/project/pdf2image/, 2017.
  2. Topicrank: Graph-based topic ranking for keyphrase extraction. In International joint conference on natural language processing (IJCNLP), pages 543–551, 2013.
  3. claird. Pypdf4. https://pypi.org/project/PyPDF4/, 2018.
  4. Jack Cushman. Pdfquery. https://github.com/jcushman/pdfquery/tree/master, 2013.
  5. Multi-modal attribute extraction for e-commerce. arXiv preprint arXiv:2203.03441, 2022.
  6. D-extract: Extracting dimensional attributes from product images. In WACV 2023, 2023.
  7. Google. Google cloud vision api. https://cloud.google.com/python/docs/reference/vision/latest.
  8. Vilt: Vision-and-language transformer without convolution or region supervision. In International conference on machine learning, pages 5583–5594. PMLR, 2021.
  9. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning, pages 12888–12900. PMLR, 2022.
  10. Multimodal attribute extraction. arXiv preprint arXiv:1711.11118, 2017.
  11. PYusuke Shinyama. pdfminer. https://www.unixuser.org/~euske/python/pdfminer/, 2004.
  12. Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5214–5223, 2019.
  13. Opentag: Open attribute value extraction from product profiles. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1049–1058, 2018.
  14. Multimodal joint attribute prediction and value extraction for e-commerce product. arXiv preprint arXiv:2009.07162, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Apurva Sinha (2 papers)
  2. Ekta Gujral (8 papers)

Summary

We haven't generated a summary for this paper yet.