Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Open-World Product Attribute Mining: A Lightly-Supervised Approach (2305.18350v1)

Published 26 May 2023 in cs.LG, cs.CL, and cs.IR

Abstract: We present a new task setting for attribute mining on e-commerce products, serving as a practical solution to extract open-world attributes without extensive human intervention. Our supervision comes from a high-quality seed attribute set bootstrapped from existing resources, and we aim to expand the attribute vocabulary of existing seed types, and also to discover any new attribute types automatically. A new dataset is created to support our setting, and our approach Amacer is proposed specifically to tackle the limited supervision. Especially, given that no direct supervision is available for those unseen new attributes, our novel formulation exploits self-supervised heuristic and unsupervised latent attributes, which attains implicit semantic signals as additional supervision by leveraging product context. Experiments suggest that our approach surpasses various baselines by 12 F1, expanding attributes of existing types significantly by up to 12 times, and discovering values from 39% new types.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Liyan Xu (28 papers)
  2. Chenwei Zhang (60 papers)
  3. Xian Li (116 papers)
  4. Jingbo Shang (141 papers)
  5. Jinho D. Choi (67 papers)
Citations (4)