Curiosity-Driven Game-Theoretic Multi-Label Learning

Updated 27 October 2025

The paper introduces CD-GTMLL, which decomposes multi-label tasks into cooperative games to focus on rare labels through intrinsic curiosity.
It leverages curiosity-based intrinsic rewards and potential-based objectives to dynamically balance performance between head and tail labels, achieving notable Rare-F1 improvements.
The framework ensures scalable, robust optimization with convergence guarantees, addressing long-tail label imbalances via coordinated block-wise updates.

Curiosity-Driven Game-Theoretic Multi-Label Learning (CD-GTMLL) is a mathematical and algorithmic framework that integrates curiosity-based intrinsic motivation with cooperative game theory to address the challenges inherent in multi-label classification, particularly those arising from long-tail label distributions. The approach entails decomposing the prediction task into a potential game among multiple learners (“players”), each incentivized to focus on under-represented labels via curiosity bonuses, resulting in adaptive specialization and state-of-the-art performance on tail-aware metrics.

1. Foundations and Motivation

Multi-label learning tasks demand the concurrent prediction of multiple binary or categorical targets per instance. In settings characterized by long-tailed label distributions, dominant (head) labels monopolize the gradient signal, often resulting in the neglect of rare (tail) labels. Standard techniques—such as class re-weighting, resampling, or post-hoc calibration—frequently fail to robustly mitigate this imbalance and may fail to provide guaranteed convergence to tail-sensitive solutions.

CD-GTMLL reframes this challenge by merging three dimensions:

Game-theoretic decomposition: Prediction is cast as a cooperative potential game with multiple players, each responsible for a (possibly overlapping) subset of labels.
Intrinsic curiosity rewards: Players receive additional rewards for improving performance on rare labels and for disagreement on overlapping predictions, operationalizing curiosity as a formal gradient signal.
Potential-based objectives: The learning procedure maximizes a differentiable potential that balances global multi-label accuracy with curiosity-driven exploration, culminating in tail-aware stationary solutions that sharpen Rare-F1 lower bounds (Xiao et al., 20 Oct 2025).

2. Cooperative Game-Theoretic Structure

The label space $\mathcal{L}$ is split among $N$ players, each possessing a head network with parameters $\theta_i$ , typically atop a shared backbone encoder. Each player $i$ predicts label probabilities $p_{il}(x)$ for its assigned label subset $\mathcal{L}_i$ . Players’ predictions are aggregated via a differentiable fusion operator (e.g., weighted sum or averaging) to form the overall model output $\hat y$ . The shared global payoff is constructed as

$R(\{\theta_i\}) = \mathbb{E}_{(x, y)}[\mathcal{M}(\hat y, y)]$

where $\mathcal{M}$ is a differentiable multi-label quality surrogate (e.g., rarity-weighted logistic loss). Each player’s personal objective augments this global reward with a curiosity term:

$J_i(\theta_i) = R(\{\theta_j\}) + \alpha \mathbb{E}_x [C_i(x)]$

where $C_i(x)$ is the player’s curiosity reward, and $\alpha$ controls the exploration-exploitation trade-off. The key theoretical insight is that coordinated block-wise best-response updates on each $\theta_i$ monotonically increase a global potential function

$\Phi(\{\theta_i\}) = R(\{\theta_i\}) + \alpha \sum_{i=1}^N \mathbb{E}_x [C_i(x)]$

ensuring convergence to a tail-sensitive equilibrium under standard regularity assumptions (Lipschitz continuity, bounded step sizes).

3. Curiosity-Driven Intrinsic Reward Design

Curiosity is encoded via rarity-sensitive and disagreement-sensitive terms. For each player $i$ , the curiosity bonus is

$C_i(x) = \sum_{l \in \mathcal{L}_i} \frac{1}{1+\mathrm{freq}(l)} \left[ y_l \log p_{il}(x) + (1 - y_l) \log (1 - p_{il}(x)) \right] + \beta\, D(\pi_i(x), \overline{\pi}_{-i}(x)^{\mathrm{stop}})$

$\mathrm{freq}(l)$ is the empirical prevalence of label $l$ ; thus, $\frac{1}{1+\mathrm{freq}(l)}$ amplifies the reward on rare labels.
$p_{il}(x)$ is the player’s probability prediction for label $l$ ; the (weighted) log-likelihood upweights rare positive instances.
$D(\cdot,\cdot)$ is a divergence (e.g., Kullback–Leibler or Jensen–Shannon) between player $i$ 's overlapping outputs $\pi_i(x)$ and the stop-gradient aggregate of others’ predictions $\overline{\pi}_{-i}(x)^{\mathrm{stop}}$ , and $\beta$ governs the influence of disagreement.

This dual mechanism ensures persistent gradient signal for rare and controversial labels, promoting specialization and exploratory learning.

4. Optimization and Convergence Analysis

CD-GTMLL employs cyclic best-response updates: each player alternately maximizes its augmented local objective $J_i$ while holding others fixed, using gradient ascent or its stochastic variants. Crucially, the curiosity term $C_i(x)$ depends exclusively on $\theta_i$ , with other players’ predictions treated as constants (“stop-gradient”), allowing proper potential ascent.

Under conditions of $\Phi(\cdot)$ differentiability and Lipschitz-continuous partial gradients, coordinate ascent yields a nondecreasing sequence $\Phi^{(t)}$ , converging to stationary solutions. Formal results (Theorem 3, (Xiao et al., 20 Oct 2025)) guarantee that rare labels receive a nonvanishing gradient whenever their fused predicted margins are suboptimal on positives, directly addressing long-tail neglect. This guarantees the existence of tail-aware block-stationary points that cannot be ignored by the learning dynamics.

5. Tail-Aware Performance Metrics and Experimental Results

Performance is rigorously evaluated using a suite of standard and tail-sensitive metrics:

Rare-F1 (F1 on tail labels, e.g., lowest-frequency 20%): Up to +4.3% gain over baselines on VOC and COCO datasets (Xiao et al., 20 Oct 2025).
Precision@ $k$ (e.g., P@3) for extreme multi-label datasets: Up to +1.6% improvement over strong transformer or hierarchical rankers on Eurlex-4K, Wiki10-31K, AmazonCat-13k.
Mean Average Precision (mAP), Micro-F1, and Macro-F1 are also reported to ensure head class performance is preserved.

Empirically, in “rare-enhanced” scenarios (where positive instances for rare labels are artificially reduced), CD-GTMLL margins increase further relative to re-weighted or monolithic baselines. Ablation reveals that the curiosity bonus induces an emergent division of labor across players: some specialize in tail label prediction, others in head label consensus, leading to quicker agreement on rare class assignments without manual class weighting.

6. Scalability and Robustness

The computational footprint of CD-GTMLL scales linearly with label count. The label space is partitioned sparsely among a small number of players; fusion and reward computation are lightweight. Training dynamics remain robust under data noise and label occlusion: ablation studies indicate that in the presence of label noise, the drop in Rare-F1 is lower for CD-GTMLL than for comparable single-head or re-weighting-based models. The modular backbone-head-fusion architecture is amenable to parallelism and allows for further architectural enhancements (e.g., adaptive label sharing).

7. Relation to Broader Curiosity and Game-Theoretic Learning Frameworks

CD-GTMLL synthesizes classical intrinsic motivation theory from computational neuroscience and developmental robotics (Oudeyer, 2018), cooperative game theory, and scalable multi-label learning. Its curiosity mechanisms draw on the principle of maximizing learning progress and minimizing free energy, akin to those used for developmental curriculum generation and world-model discovery in autonomous agents. The division of collaborative responsibility via game-theoretic decomposition parallels frameworks where each agent explores regions of high uncertainty, high conflict, or rare novelty (Sun et al., 2022). Compared to active learning or pseudo-labeling schemes (Qi et al., 26 Nov 2024), CD-GTMLL offers explicit convergence guarantees and a formally justified bias toward tail label exploration, tunable by hyperparameters without hand-crafted class weights.

8. Significance and Outlook

Curiosity-Driven Game-Theoretic Multi-Label Learning constitutes a principled, scalable, and empirically validated route to countering long-tail imbalance in multi-label prediction. By decomposing prediction into overlapping, cooperative subproblems under curiosity-informed incentives and fusing their outputs, the framework achieves state-of-the-art Rare-F1 and tail-sensitive metrics on both moderate and extreme-scale benchmarks (Xiao et al., 20 Oct 2025). The underlying block coordinate potential maximization strategy and reward shaping open further research opportunities, including adaptive player allocation, hierarchical label grouping, integration with active learning, and unsupervised representation learning. CD-GTMLL thus occupies a central locus among modern approaches for robust, equitable multi-label classification in high-cardinality, highly imbalanced real-world data distributions.