Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

102 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

6 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

119 1

eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data (2402.08831v2)

Published 13 Feb 2024 in cs.CL, cs.AI, and cs.IR

Abstract: With tremendous efforts on developing effective e-commerce models, conventional e-commerce models show limited success in generalist e-commerce modeling, and suffer from unsatisfactory performance on new users and new products - a typical out-of-domain generalization challenge. Meanwhile, LLMs demonstrate outstanding performance in generalist modeling and out-of-domain generalizability in many fields. Toward fully unleashing their power for e-commerce, in this paper, we construct ECInstruct, the first open-sourced, large-scale, and high-quality benchmark instruction dataset for e-commerce. Leveraging ECInstruct, we develop eCeLLM, a series of e-commerce LLMs, by instruction-tuning general-purpose LLMs. Our comprehensive experiments and evaluation demonstrate that eCeLLM models substantially outperform baseline models, including the most advanced GPT-4, and the state-of-the-art task-specific models in in-domain evaluation. Moreover, eCeLLM exhibits excellent generalizability to out-of-domain settings, including unseen products and unseen instructions, highlighting its superiority as a generalist e-commerce model. Both the ECInstruct dataset and the eCeLLM models show great potential in empowering versatile and effective LLMs for e-commerce. ECInstruct and eCeLLM models are publicly accessible through https://ninglab.github.io/eCeLLM.

PDF HTML Abstract

An Exploration into E-commerce Task Instruction Tuning

The paper discusses the development and effectiveness of instruction-tuned models for addressing task-specific challenges in e-commerce environments. The authors provide a comprehensive dataset of tasks split into four categories: Product Understanding, User Understanding, Query Product Matching, and Product Question Answering. Each category comprises well-defined subtasks designed to enhance the performance of LLMs in specific e-commerce scenarios.

Methodology

The methodology involves characterizing tasks with structured data and employing LLMs to achieve high accuracy through both in-domain (IND) and out-of-domain (OOD) evaluations. Tasks like attribute value extraction, sentiment analysis, and product matching form the benchmarks, using metrics such as precision, recall, F1 scores, and NDCG to assess model performance.

Data Processing

The dataset includes extensive preprocessing measures for both IND and OOD evaluations, encompassing the Amazon Review dataset, Amazon-Google Product data, and the Shopping Queries dataset. The raw datasets were split with an 8:1:1 ratio for training, validation, and test sets, respectively. A critical aspect of the data processing was downsampling for efficiency, allowing the models to evaluate more effectively within the constraints of computational resources.

Instruction Design

A notable element is the utilization of generated and unseen instructions during training. The multiple instructions per task, including unseen ones, highlight the versatility and adaptability of the models. Such comprehensive design ensures that the LLMs are not only evaluated on explicit instructions but also on those they have not encountered, testing their generalization capabilities.

Results and Analysis

The models exhibited superior performance when specifically tuned on individual and combined task datasets. Notably, the Llama-2 13B-chat model demonstrated its utility as a robust base model for instruction-tuned tasks. Mistral-7B Instruct-v0.2 and Phi-2 were identified as effective for particular tasks, reflecting the importance of selecting an appropriate base model for domain-specific challenges.

In-domain Evaluation:

Attribute Value Extraction: Models achieved an F1* score of up to 0.595.
Product Relation Prediction: The macro F1 score reached ~0.502.
Sentiment Analysis: Macro F1 improved by incorporating more comprehensive tuning datasets.

Out-of-domain Evaluation:

The task-specific fine-tuning and general instruction tuning both displayed commendable results, although task-specific fine-tuning offered slightly better results on average. The instruction-tuned LLMs exceeded the capabilities of some SoTA task-specific models, notably in generalization to unseen domains.

Implications and Future Directions

This work has significant implications for the application of LLMs in practical e-commerce settings. The refined models can enhance user interaction by accurately understanding products, user sentiment, and optimizing query-product relationships. The structured approach offers a blueprint for task specialization within AI models.

Further exploration could involve enhancing dataset diversity and tuning processes to mitigate models' biases. The improvement of fine-grained control in tasks, such as query substitution, could also be pursued. Moreover, future work could investigate the interplay between different task categories to synergize LLM capabilities further, which could enrich the model's understanding of multi-faceted e-commerce scenarios.

In conclusion, this work exemplifies the promise of instruction tuning in specialized domains, showing marked improvements over general-purpose models across a variety of tasks. As more datasets and refined LLM architectures develop, it is feasible to anticipate increasingly sophisticated models that could redefine interaction models in e-commerce and beyond.

PDF Markdown Bookmark Chat (Pro)

References (65)

Authors (5)

Bo Peng (304 papers)
Xinyi Ling (2 papers)
Ziru Chen (20 papers)
Huan Sun (88 papers)
Xia Ning (48 papers)

Citations (8)

View on Semantic Scholar

GitHub

eCeLLM

Tweets

https://twitter.com/florianhoenicke/status/1823253155603243121

https://twitter.com/hhsun1/status/1786218577101570070

https://twitter.com/TheNingLab/status/1786190665191133549

https://twitter.com/_reachsumit/status/1758004442002268615