Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification (1811.01727v3)

Published 1 Nov 2018 in cs.CL and cs.LG

Abstract: Extreme multi-label text classification (XMTC) is an important problem in the era of big data, for tagging a given text with the most relevant multiple labels from an extremely large-scale label set. XMTC can be found in many applications, such as item categorization, web page tagging, and news annotation. Traditionally most methods used bag-of-words (BOW) as inputs, ignoring word context as well as deep semantic information. Recent attempts to overcome the problems of BOW by deep learning still suffer from 1) failing to capture the important subtext for each label and 2) lack of scalability against the huge number of labels. We propose a new label tree-based deep learning model for XMTC, called AttentionXML, with two unique features: 1) a multi-label attention mechanism with raw text as input, which allows to capture the most relevant part of text to each label; and 2) a shallow and wide probabilistic label tree (PLT), which allows to handle millions of labels, especially for "tail labels". We empirically compared the performance of AttentionXML with those of eight state-of-the-art methods over six benchmark datasets, including Amazon-3M with around 3 million labels. AttentionXML outperformed all competing methods under all experimental settings. Experimental results also show that AttentionXML achieved the best performance against tail labels among label tree-based methods. The code and datasets are available at http://github.com/yourh/AttentionXML .

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ronghui You (4 papers)
  2. Zihan Zhang (121 papers)
  3. Ziye Wang (5 papers)
  4. Suyang Dai (3 papers)
  5. Hiroshi Mamitsuka (14 papers)
  6. Shanfeng Zhu (9 papers)
Citations (241)

Summary

  • The paper introduces a novel label-specific attention mechanism that tailors text representations for each label, boosting accuracy for infrequent tags.
  • The methodology employs a shallow probabilistic label tree that efficiently scales to millions of labels while reducing computational cost.
  • Experiments show significant improvements over state-of-the-art methods, including a 17% P@5 boost on the Wiki-500K dataset.

Overview of AttentionXML: Label Tree-based Deep Learning Model for Extreme Multi-Label Text Classification

The paper presents a novel approach to tackle the challenges of extreme multi-label text classification (XMTC) through the development of AttentionXML, a label tree-based deep learning model. XMTC involves tagging texts with labels from an extremely large-scale label set, encompassing applications such as item categorization and web page tagging. The traditional reliance on bag-of-words (BOW) lacks context and semantic depth, while existing deep learning methods struggle with capturing label-specific subtext and scalability issues.

Key Features of AttentionXML

AttentionXML introduces two main innovations:

  1. Multi-label Attention Mechanism: This mechanism allows the model to focus on the most relevant portions of text for each label, using raw text inputs. Unlike previous models, AttentionXML tailors text representations uniquely for each label, aiding in accuracy, particularly for tail labels with few instances.
  2. Probabilistic Label Tree (PLT): A shallow and wide PLT is utilized to handle scalability efficiently, accommodating millions of labels. This architecture improves performance on tail labels, overcoming issues related to deep trees and inaccurate likelihood approximations affecting existing methods.

Experimental Results

The empirical evaluation of AttentionXML against eight state-of-the-art methods across six benchmark datasets, including the challenging Amazon-3M with around 3 million labels, demonstrates its superiority. The model consistently outperformed competitors under all settings and was particularly effective in handling datasets with long texts, leveraging its multi-label attention mechanism to distill essential information.

For instance, AttentionXML achieved significant improvement in P@5P@5 scores across datasets. Noteworthy results include over 17% enhancement on the Wiki-500K dataset compared to DiSMEC, the former leader. The model’s ensemble capabilities further extend its lead, reducing variance and increasing predictive performance.

Implications and Future Directions

AttentionXML expands the theoretical framework of XMTC by integrating attention mechanisms within a scalable tree-based structure, thus enhancing both prediction accuracy and computational efficiency. The capacity to separately attend to different text segments per label provides a robust solution to the semantic granularity problem ubiquitous in this domain.

Practically, the advent of AttentionXML suggests potential improvements in various applications requiring XMTC, from better recommendation systems in e-commerce to more accurate document indexing and retrieval.

Future developments may consider further optimizing the model’s space complexity, exploring alternative PLT configurations, or refining attention mechanisms to boost processing speed and resource utilization. Additionally, extending this approach to other multi-label contexts beyond text classification could yield fruitful avenues of research.

The paper of AttentionXML presents a compelling advancement in XMTC, offering a structured framework with promising results, setting a benchmark for future explorations in label-intensive machine learning tasks.

Youtube Logo Streamline Icon: https://streamlinehq.com