Papers
Topics
Authors
Recent
Search
2000 character limit reached

Product Classification in E-Commerce using Distributional Semantics

Published 20 Jun 2016 in cs.AI, cs.CL, and cs.IR | (1606.06083v2)

Abstract: Product classification is the task of automatically predicting a taxonomy path for a product in a predefined taxonomy hierarchy given a textual product description or title. For efficient product classification we require a suitable representation for a document (the textual description of a product) feature vector and efficient and fast algorithms for prediction. To address the above challenges, we propose a new distributional semantics representation for document vector formation. We also develop a new two-level ensemble approach utilizing (with respect to the taxonomy tree) a path-wise, node-wise and depth-wise classifiers for error reduction in the final product classification. Our experiments show the effectiveness of the distributional representation and the ensemble approach on data sets from a leading e-commerce platform and achieve better results on various evaluation metrics compared to earlier approaches.

Citations (38)

Summary

  • The paper introduces a novel two-level ensemble classification framework that integrates path-wise, node-wise, and depth-wise classifiers for hierarchical product categorization.
  • The paper employs a graded weighted Bag of Word Vectors (gwBoWV) method to capture semantic nuances, overcoming sparsity issues of traditional representations.
  • The paper demonstrates significant improvements in top-K path prediction accuracy over traditional models, enhancing efficiency in large-scale e-commerce catalog management.

Product Classification in E-Commerce using Distributional Semantics

Introduction

The research paper "Product Classification in E-Commerce using Distributional Semantics" (1606.06083) addresses the challenge of categorizing products in vast e-commerce catalogs using distributional semantics. The paper introduces a novel approach to represent document vectors and proposes a two-level ensemble classification system tailored to the hierarchical taxonomy employed by e-commerce platforms.

Distributional Semantics Representation

Traditional document representations like tf-idf and BoW fail in capturing semantic nuances due to their high dimensionality and sparsity. The paper adopts a distributional semantics approach, encoding words in low-dimensional continuous vector spaces (R100\mathcal{R}^{100} to R300\mathcal{R}^{300}), capitalizing on the 'Distributional Hypothesis'. This method involves weighted summation and clustering of these word vectors to form graded weighted Bag of Word Vectors (gwBoWV). Figure 1

Figure 1: Neural Network Architecture for CBoW and Skip Gram Model.

Paragraph Vector Limitations and Improvements

The paper critiques the existing paragraph vector models (PV-DM and PV-DBoW) for their inability to capture multiple topic representations within a document and for treating all words in a document uniformly. To overcome these limitations, a new composite technique is proposed, leveraging enriched word vectors and graded weighting inspired by recent advancements in semantic vector composition. Figure 2

Figure 2: Neural Network Architecture for Distributed Memory version of Paragraph Vector (PV-DM) and Distributed BoWs version of paragraph vectors (PV-DBoW).

Hierarchical Product Categorization

Product categorization in e-commerce is a hierarchical problem, fraught with challenges such as data sparsity and label-dependent constraints. Existing methods, like the gates-and-experts approach, typically suffer from complexity and inefficiency. The paper's proposed ensemble approach uses classifiers trained on path-wise, node-wise, and depth-wise labels within the taxonomy, significantly improving classification accuracy while maintaining computational efficiency.

Ensemble Classification Framework

A two-level ensemble framework is employed, where the first level utilizes the outputs from different classifiers (path-wise, node-wise, and depth-wise classifiers) to form a feature vector. This vector is then used by the second-level classifier to refine and enhance prediction accuracy. The ensemble effectively integrates multi-type feature predictions, demonstrating superior performance across various metrics. Figure 3

Figure 3: Comparison of prediction accuracy for path prediction using different methods for document vector generation.

Results and Discussion

The proposed method, when tested on datasets from a prominent e-commerce platform, showcases remarkable improvements in classification accuracy over baseline methods such as PV-DBoW and BoCV. Specifically, the introduction of gwBoWV combined with the ensemble approach yields enhanced top-K path prediction accuracy.

Implications and Future Work

The framework's success signifies a substantial leap in e-commerce product classification, paving the way for more nuanced and accurate catalog management. Future research could explore alternative clustering methods like the Chinese Restaurant Process, integrate supervised embeddings, and enhance label embedding approaches for improved performance.

Conclusion

This paper contributes a robust distributional semantics-based document representation and a novel ensemble classification approach, addressing the multifaceted challenges of e-commerce product classification. The methodologies set a foundation for further innovation in hierarchical classification and semantic representation, influencing future developments in AI-driven cataloging systems. Figure 4

Figure 4: Misclassification Graphs with latent groups in Computer and Computer Accessories, here each edge from C1 →\rightarrow C2 represents misclassification from C1 →\rightarrow C2 with threshold >10%> 10\% of correct prediction, here isolated vertices represent almost correctly predicted classes.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.