ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction (2404.15592v2)
Abstract: Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction. ImplicitAVE, sourced from the MAVE dataset, is carefully curated and expanded to include implicit AVE and multimodality, resulting in a refined dataset of 68k training and 1.6k testing data across five domains. We also explore the application of multimodal LLMs (MLLMs) to implicit AVE, establishing a comprehensive benchmark for MLLMs on the ImplicitAVE dataset. Six recent MLLMs with eleven variants are evaluated across diverse settings, revealing that implicit value extraction remains a challenging task for MLLMs. The contributions of this work include the development and release of ImplicitAVE, and the exploration and benchmarking of various MLLMs for implicit AVE, providing valuable insights and potential future research directions. Dataset and code are available at https://github.com/HenryPengZou/ImplicitAVE
- Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966.
- InstructBLIP: Towards general-purpose vision-language models with instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems.
- Musechat: A conversational music recommendation system for videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Llm-ensemble: Optimal large language model ensemble method for e-commerce product attribute value extraction. arXiv preprint arXiv:2403.00863.
- Large scale generative multimodal attribute extraction for E-commerce attributes. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 305–312, Toronto, Canada. Association for Computational Linguistics.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, pages 19730–19742. PMLR.
- Improved baselines with visual instruction tuning. In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following.
- Visual instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems.
- Cheap and quick: Efficient vision-language instruction tuning for large language models. In Thirty-seventh Conference on Neural Information Processing Systems.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
- Zero-shot text-to-image generation. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8821–8831. PMLR.
- Learning to extract attribute value from product via question answering: A multi-task approach. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, page 47–55, New York, NY, USA. Association for Computing Machinery.
- Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5214–5223, Florence, Italy. Association for Computational Linguistics.
- AdaTag: Multi-attribute value extraction from product profiles with adaptive decoding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4694–4705, Online. Association for Computational Linguistics.
- MixPAVE: Mix-prompt tuning for few-shot product attribute value extraction. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9978–9991, Toronto, Canada. Association for Computational Linguistics.
- Mave: A product dataset for multi-source attribute value extraction. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, WSDM ’22, page 1256–1265, New York, NY, USA. Association for Computing Machinery.
- mplug-owl: Modularization empowers large language models with multimodality. arXiv preprint arXiv:2304.14178.
- All you need to know to build a product knowledge graph. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, page 4090–4091, New York, NY, USA. Association for Computing Machinery.
- Pay attention to implicit attribute values: A multi-modal generative framework for AVE task. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13139–13151, Toronto, Canada. Association for Computational Linguistics.
- Opentag: Open attribute value extraction from product profiles. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, page 1049–1058, New York, NY, USA. Association for Computing Machinery.
- Multimodal joint attribute prediction and value extraction for E-commerce product. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2129–2139, Online. Association for Computational Linguistics.
- Eiven: Efficient implicit attribute value extraction using multimodal llm. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track.
- Henry Peng Zou (26 papers)
- Vinay Samuel (9 papers)
- Yue Zhou (130 papers)
- Weizhi Zhang (25 papers)
- Liancheng Fang (11 papers)
- Zihe Song (12 papers)
- Philip S. Yu (592 papers)
- Cornelia Caragea (58 papers)