EshopInstruct: E-Commerce Instruction Tuning Dataset

Updated 17 September 2025

EshopInstruct is a large-scale multi-task instruction corpus featuring 65K samples that standardizes diverse e-commerce shopping tasks.
It combines LLM-driven data generation, public dataset extraction, and custom synthesis to enhance domain-specific instruction tuning.
Optimized via LoRA fine-tuning, quantization, and advanced prompt engineering, it boosts model performance in multi-lingual and reasoning applications.

The EshopInstruct Dataset is a large-scale multi-task instruction corpus designed to facilitate the instruction tuning of LLMs for e-commerce shopping assistance applications. It provides 65,000 task-diverse samples across shopping concept understanding, knowledge reasoning, user behavior alignment, and multi-lingual capabilities, underpinning the training of models such as LLaSA—a domain-adapted assistant evaluated in the Amazon KDD Cup 2024 Challenge (Zhang et al., 4 Aug 2024). The dataset synthesizes examples from seed data, public resources, and LLM-driven creation to address high-priority shopping tasks with unified text-to-text formats and real-world relevance.

1. Dataset Structure and Composition

EshopInstruct encompasses approximately 65,000 instruction–response pairs, constructed via three principal methods: (i) data generation using LLMs generalizing over 18 canonical e-commerce task types, (ii) extraction and transformation of related public datasets (e.g., ECInstruct) to a unified format, and (iii) custom synthesis of instructions for under-represented areas such as concept normalization and daily product recommendations—with chain-of-thought reasoning used to bolster sample coherence and complexity.

The dataset supports the following core competency domains:

Shopping Concept Understanding: Includes tasks of normalization, extraction, summarization, relation inference, and sentiment analysis.
Shopping Knowledge Reasoning: Encompasses numerical, commonsense, and multi-hop reasoning scenarios.
User Behavior Alignment: Covers query-based recommendation, purchase history modeling, behavioral prediction, and sentiment label assignment.
Multi-Lingual Shopping Abilities: Integrates tasks presented in multiple languages aligned with global e-commerce requirements.

All instructions are standardized within text-to-text mapping. Target outputs span generation, classification, retrieval, multiple-choice, and NER paradigms, involving empirically-sourced shopping queries, reviews, and activity logs.

2. Methodology for Instruction Tuning

Instruction tuning advances the model’s capacity to generalize from generic language understanding to e-commerce–specific subtasks. The process involves fine-tuning several backbone LLMs—including Mistral-7B, LLama3-8B, and Qwen2-7B/72B—using the Low-Rank Adaptation (LoRA) protocol. In LoRA, only low-dimensional updates (rank=8) to query, key, and value projection layers are applied, with full model weights frozen; this reduces both computational cost and overfitting tendency.

Training utilizes an auto-regressive language modeling objective and employs the AdamW optimizer, a cosine learning rate schedule, a peak learning rate of $4 \times 10^{-5}$ , and a $10\%$ warmup ratio, with a maximum sequence length of 2048. These hyperparameter choices are tailored to facilitate the absorption of shopping-task instructions in a resource-efficient manner.

3. Model Architecture and Application

Models trained on EshopInstruct, principally LLaSA, are built on large-scale LLM foundations such as Qwen2-72B. Their architecture combines:

Backbone LLM: Pre-trained for general linguistic competence.
LoRA-Fine-Tuning: Domain specialization via selected low-rank parameter adaptation.
Text-to-Text Interface: Harmonizes task formats to leverage unified generation paradigms.

This design enables a single model to proficiently address a broad set of shopping assistant tasks, mitigating previous limitations of task-specific model proliferation and poor up-to-date product generalization.

4. Inference and Performance Optimization

Real-world deployment of such large models is constrained by GPU memory limits, commonly 16–64GB across a heterogeneous device fleet. The following optimization strategies are employed:

Quantization: Qwen2-72B is post-process quantized using GPTQ, reducing memory footprint. Weight matrix rows are quantized to int4 and restored to fp16 during inference, incurring minimal accuracy loss.
Prompt Engineering: Advanced strategies include chain-of-thought prompting for complex reasoning, few-shot prompting with three relevant in-distribution examples, and re-reading mechanisms that prompt the model to reconsume queries for enhanced attention and accuracy. In specific cases, regular expressions are used to extract relevant answer scopes.

These optimizations collectively preserve performance while enabling deployment on practical hardware.

5. Empirical Evaluation and Competitive Benchmarks

EshopInstruct–tuned models are evaluated on ShopBench across five tracks, reflecting major shopping skill categories. Performance is measured using track- and task-specific metrics:

Track Name	Score (Representative)	Task Sample Score
Shopping Concept Understanding	up to 0.824	0.860 (Multiple-Choice), 0.789 (NER)
Multi-Lingual Abilities	highest among student teams	—
Overall KDD Cup Placement	3rd	Top-5 in all tracks

Success factors include dataset breadth, LoRA parameter efficiency, GPTQ quantization effectiveness, and prompt design.

6. Challenges and Solutions

The EshopInstruct approach addresses several domain-specific challenges:

Domain Adaptation: Generic LLMs lack e-commerce expertise. Solution: Comprehensive, targeted instruction tuning.
Task Diversity and Model Consolidation: Previous assistants required multiple models. Solution: Multi-task, text-to-text formalization permits unified modeling.
Resource Constraints: Large models pose inference challenges on limited hardware. Solution: Inference quantization and prompt engineering.

A plausible implication is that continual dataset refinement and advanced quantization will be necessary as model scale and e-commerce complexity increase.

7. Future Research Prospects

The dataset authors posit opportunities for further advancement:

Dataset Expansion: Raise real-world coverage, extend novel task classes, enhance LLM sample filtering.
Advanced Quantization: Develop more robust quantization algorithms for even larger models or more stringent inference envelopes.
Multi-lingual and Cross-Market Generalization: Probe model capabilities in emerging languages and diverse consumer markets.
Continual Learning: Implement incremental knowledge update protocols so models remain contemporaneous with catalog changes.

This suggests an ongoing evolution of training and deployment strategies tuned to the dynamic requirements of e-commerce search and recommendation workloads.

PDF Markdown Chat (Pro)

References (1)

LLaSA: Large Language and E-Commerce Shopping Assistant (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to EshopInstruct Dataset.