A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation (2402.13587v2)

Published 21 Feb 2024 in cs.CL and cs.CV

Abstract: In this paper, we propose a new setting for generating product descriptions from images, augmented by marketing keywords. It leverages the combined power of visual and textual information to create descriptions that are more tailored to the unique features of products. For this setting, previous methods utilize visual and textual encoders to encode the image and keywords and employ a LLM-based decoder to generate the product description. However, the generated description is often inaccurate and generic since same-category products have similar copy-writings, and optimizing the overall framework on large-scale samples makes models concentrate on common words yet ignore the product features. To alleviate the issue, we present a simple and effective Multimodal In-Context Tuning approach, named ModICT, which introduces a similar product sample as the reference and utilizes the in-context learning capability of LLMs to produce the description. During training, we keep the visual encoder and LLM frozen, focusing on optimizing the modules responsible for creating multimodal in-context references and dynamic prompts. This approach preserves the language generation prowess of LLMs, facilitating a substantial increase in description diversity. To assess the effectiveness of ModICT across various LLM scales and types, we collect data from three distinct product categories within the E-commerce domain. Extensive experiments demonstrate that ModICT significantly improves the accuracy (by up to 3.3% on Rouge-L) and diversity (by up to 9.4% on D-5) of generated results compared to conventional methods. Our findings underscore the potential of ModICT as a valuable tool for enhancing automatic generation of product descriptions in a wide range of applications. Code is at: https://github.com/HITsz-TMG/Multimodal-In-Context-Tuning

References (60)

Authors (6)

Yunxin Li (29 papers)
Baotian Hu (67 papers)
Wenhan Luo (88 papers)
Lin Ma (206 papers)
Yuxin Ding (9 papers)
Min Zhang (630 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/LyxTg/status/1765787824639611285

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation (2402.13587v2)

Summary

Related Papers

Tweets