Rethinking E-Commerce Search (2312.03217v1)

Published 6 Dec 2023 in cs.IR and cs.CL

Abstract: E-commerce search and recommendation usually operate on structured data such as product catalogs and taxonomies. However, creating better search and recommendation systems often requires a large variety of unstructured data including customer reviews and articles on the web. Traditionally, the solution has always been converting unstructured data into structured data through information extraction, and conducting search over the structured data. However, this is a costly approach that often has low quality. In this paper, we envision a solution that does entirely the opposite. Instead of converting unstructured data (web pages, customer reviews, etc) to structured data, we instead convert structured data (product inventory, catalogs, taxonomies, etc) into textual data, which can be easily integrated into the text corpus that trains LLMs. Then, search and recommendation can be performed through a Q/A mechanism through an LLM instead of using traditional information retrieval methods over structured data.

Citations (2)

View on Semantic Scholar

Summary

The paper proposes a paradigm shift by converting structured product data into annotated text to integrate LLMs for search and recommendation tasks.
It introduces universal IDs and text annotations to improve query understanding and product retrieval by leveraging LLMs' world knowledge.
The work also outlines future research directions such as latency optimization, personalization, and mitigating catastrophic forgetting.

Rethinking E-Commerce Search

Introduction

The paper "Rethinking E-Commerce Search" presents a novel approach to the challenges of search and recommendation systems in the e-commerce domain. The traditional method converts unstructured data into structured formats to facilitate search and retrieval. The authors propose an alternative approach: inversely converting structured data into text, enabling integration with LLMs for search and recommendation purposes.

Challenges in E-Commerce Search

E-commerce search is primarily dominated by structured data from product catalogs. Two main challenges are highlighted: query understanding and product understanding. Traditional methods face difficulties in extracting user intent from short search queries and relating them to product attributes. Furthermore, capturing the necessary world knowledge for understanding products beyond basic features proves challenging.

Vision for a New Approach

The paper proposes a paradigm shift wherein structured and semi-structured data is transformed into text. This allows LLMs, pretrained on extensive text corpora, to handle search and recommendation tasks through question-and-answer mechanisms. LLMs possess an inherent understanding of world knowledge, bypassing the need for complex product knowledge graphs and dedicated query understanding systems.

Technical Implementation

Universal IDs: Establishing universal IDs for database entities embedded into text allows LLMs to refer to these during query processing. The authors discuss methods for ID representation to mitigate the issues of large ID spaces and prior knowledge interference.
Annotated Text Generation: The conversion of structured data into annotated texts incorporates entity IDs into textual descriptions. This is achieved through manually created templates, LLM-generated descriptions, and leveraging user engagement data for query-based templates.
System Architecture: The framework involves ingesting the annotated texts into LLMs during training, with the models being fine-tuned to transfer database knowledge into an LLM while maintaining world knowledge and linguistic capabilities.

Inference and Applications

The proposed system utilizes LLMs for retrieval and recommendation at inference time through various configurations, such as zero-shot and few-shot learning. The paper illustrates potential use cases like product search retrieval, recommendations, and search suggestions, using specific prompts designed to elicit responses with linked product IDs.

Research Directions

The authors outline several areas for further investigation, including:

Latency Optimization: Strategies to reduce response times, including encoder-based approaches and model compression via distillation, pruning, and quantization.
Personalization: Embedding user history features as input context to enhance personalized search experiences.
Catastrophic Forgetting: Addressing the issue of retaining learned information even as product databases evolve and expand over time.

Conclusion

The paper posits a transformative shift in e-commerce search by leveraging LLMs to manage both structured product information and broad world knowledge. This fusion aims to streamline complex systems and enhance retrieval efficacy, projecting a unified model infrastructure as the future of e-commerce information systems. Future research will likely explore optimizing these integrations, enhancing latency, scalability, and personalization.