Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM (2404.08886v1)

Published 13 Apr 2024 in cs.CV, cs.AI, cs.CL, cs.IR, and cs.LG

Abstract: In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To address these issues, we introduce EIVEN, a data- and parameter-efficient generative framework that pioneers the use of multimodal LLM for implicit attribute value extraction. EIVEN leverages the rich inherent knowledge of a pre-trained LLM and vision encoder to reduce reliance on labeled data. We also introduce a novel Learning-by-Comparison technique to reduce model confusion by enforcing attribute value comparison and difference identification. Additionally, we construct initial open-source datasets for multimodal implicit attribute value extraction. Our extensive experiments reveal that EIVEN significantly outperforms existing methods in extracting implicit attribute values while requiring less labeled data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Generative models for product attribute extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 575–585, Singapore. Association for Computational Linguistics.
  2. Extreme multi-label classification with label masking for product attribute value extraction. In Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5), pages 134–140, Dublin, Ireland. Association for Computational Linguistics.
  3. InstructBLIP: Towards general-purpose vision-language models with instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems.
  4. Musechat: A conversational music recommendation system for videos. ArXiv, abs/2310.06282.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.
  6. Cma-clip: Cross-modality attention clip for text-image classification. In 2022 IEEE International Conference on Image Processing (ICIP), pages 2846–2850.
  7. What do vision transformers learn? a visual exploration. ArXiv, abs/2212.06727.
  8. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  9. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5254–5276, Singapore. Association for Computational Linguistics.
  10. Large scale generative multimodal attribute extraction for E-commerce attributes. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 305–312, Toronto, Canada. Association for Computational Linguistics.
  11. Empowering unsupervised domain adaptation with large-scale pre-trained vision-language models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2691–2701.
  12. A semi-supervised learning for segmentation of gigapixel histopathology images from brain tissues. 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1920–1923.
  13. AtTGen: Attribute tree generation for real-world attribute joint extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2139–2152, Toronto, Canada. Association for Computational Linguistics.
  14. Pam: Understanding product images in cross product category attribute extraction. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3262–3270.
  15. Visual instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems.
  16. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations.
  17. Towards efficient visual adaption via structural re-parameterization. ArXiv, abs/2302.08106.
  18. Cheap and quick: Efficient vision-language instruction tuning for large language models. In Thirty-seventh Conference on Neural Information Processing Systems.
  19. Understanding neural networks via feature visualization: A survey. Explainable AI: interpreting, explaining and visualizing deep learning, pages 55–76.
  20. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 188–197, Hong Kong, China. Association for Computational Linguistics.
  21. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  22. Noam M. Shazeer. 2020. Glu variants improve transformer. ArXiv, abs/2002.05202.
  23. A unified generative approach to product attribute-value identification. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6599–6612, Toronto, Canada. Association for Computational Linguistics.
  24. Tinyllm: Learning a small student from multiple large language models. ArXiv, abs/2402.04616.
  25. Graph neural prompting with large language models. In AAAI Conference on Artificial Intelligence.
  26. Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971.
  27. Learning to extract attribute value from product via question answering: A multi-task approach. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 47–55.
  28. SMARTAVE: Structured multimodal transformer for product attribute value extraction. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 263–276, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  29. Knowledge graph prompting for multi-document question answering. In AAAI Conference on Artificial Intelligence.
  30. Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5214–5223, Florence, Italy. Association for Computational Linguistics.
  31. Towards open-world product attribute mining: A lightly-supervised approach. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12223–12239, Toronto, Canada. Association for Computational Linguistics.
  32. AdaTag: Multi-attribute value extraction from product profiles with adaptive decoding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4694–4705, Online. Association for Computational Linguistics.
  33. MixPAVE: Mix-prompt tuning for few-shot product attribute value extraction. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9978–9991, Toronto, Canada. Association for Computational Linguistics.
  34. Mave: A product dataset for multi-source attribute value extraction. In Proceedings of the fifteenth ACM international conference on web search and data mining, pages 1256–1265.
  35. Pay attention to implicit attribute values: A multi-modal generative framework for AVE task. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13139–13151, Toronto, Canada. Association for Computational Linguistics.
  36. Opentag: Open attribute value extraction from product profiles. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1049–1058.
  37. Multimodal joint attribute prediction and value extraction for E-commerce product. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2129–2139, Online. Association for Computational Linguistics.
  38. Henry Zou and Cornelia Caragea. 2023. JointMatch: A unified approach for diverse and collaborative pseudo-labeling to semi-supervised text classification. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7290–7301, Singapore. Association for Computational Linguistics.
  39. DeCrisisMB: Debiased semi-supervised learning for crisis tweet classification via memory bank. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6104–6115, Singapore. Association for Computational Linguistics.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com