Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 156 tok/s

Gemini 2.5 Pro 44 tok/s Pro

GPT-5 Medium 23 tok/s Pro

GPT-5 High 22 tok/s Pro

GPT-4o 109 tok/s Pro

Kimi K2 168 tok/s Pro

GPT OSS 120B 455 tok/s Pro

Claude Sonnet 4.5 32 tok/s Pro

2000 character limit reached

GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets (2504.19898v1)

Published 28 Apr 2025 in cs.CL

Abstract: As a fundamental task in machine learning, text classification plays a crucial role in many areas. With the rapid scaling of LLMs, particularly through reinforcement learning (RL), there is a growing need for more capable discriminators. Consequently, advances in classification are becoming increasingly vital for enhancing the overall capabilities of LLMs. Traditional discriminative methods map text to labels but overlook LLMs' intrinsic generative strengths. Generative classification addresses this by prompting the model to directly output labels. However, existing studies still rely on simple SFT alone, seldom probing the interplay between training and inference prompts, and no work has systematically leveraged RL for generative text classifiers and unified SFT, RL, and inference-time prompting in one framework. We bridge this gap with GenCLS++, a framework that jointly optimizes SFT and RL while systematically exploring five high-level strategy dimensions-in-context learning variants, category definitions, explicit uncertainty labels, semantically irrelevant numeric labels, and perplexity-based decoding-during both training and inference. After an SFT "policy warm-up," we apply RL with a simple rule-based reward, yielding sizable extra gains. Across seven datasets, GenCLS++ achieves an average accuracy improvement of 3.46% relative to the naive SFT baseline; on public datasets, this improvement rises to 4.00%. Notably, unlike reasoning-intensive tasks that benefit from explicit thinking processes, we find that classification tasks perform better without such reasoning steps. These insights into the role of explicit reasoning provide valuable guidance for future LLM applications.

Summary

A Review of GenCLS++: Advancing Generative Classification in LLMs

The paper "GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets" presents an innovative approach to text classification by leveraging the capabilities of LLMs. The authors address the limitations of traditional discriminative methods, which often ignore the generative strengths inherent in LLMs, by introducing GenCLS++, a framework that systematically integrates Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) in generative classification tasks. The paper explores five strategic dimensions, evaluating both training and inference stages to optimize performance across multiple datasets.

Key Contributions

Unified Framework for Generative Classification: GenCLS++ combines SFT and RL into a cohesive framework, exploring a wide array of prompt strategies. This holistic approach is designed to improve classification accuracy by enhancing the generative capacities of LLMs—a departure from purely discriminative methods.
Strategic Prompt Exploration: The paper investigates how various prompt strategies influence performance in both training and inference. By employing strategies such as in-context learning with semantic retrieval and varying exemplars, explicit uncertainty labels, numeric label assignments, and perplexity-based decoding, the framework meticulously optimizes prompts for classification tasks.
Empirical Gains via Reinforcement Learning: RL, applied post-SFT, yields substantial accuracy improvements, challenging preconceived notions that reinforcement gains are limited in classification contexts. Specifically, a simple rule-based reward mechanism delivers gains exceeding typical SFT enhancements.
Comprehensive Evaluation: GenCLS++ is rigorously evaluated across seven datasets, including broad public benchmarks and proprietary datasets. The framework achieves an average accuracy improvement of 3.46% relative to naive SFT baselines, with peaks of 6.10% improvement on specific datasets like IFLYTEK.

Numerical Results and Insights

The authors report that combining different training and inference prompt strategies yields notable accuracy improvements beyond naive utilization of prompts. Such gains underscore the necessity of tailored prompts in extracting generative strengths from LLMs. Notably, for datasets requiring complex reasoning, the results indicate that skipping explicit reasoning steps may enhance classification outcomes—a critical insight for future applications of LLMs in classification tasks.

In addition to achieving strong numerical results, the paper reveals that combining the generative and discriminative paradigms can deliver benefits that neither approach yet achieves independently. The reinforcement learning model alone provides significant gains, achieving the highest accuracy across datasets when initiated from a well-tuned SFT model.

Implications and Future Directions

The paper contributes meaningful improvements to the field of text classification by illustrating that harnessing the inherent generative capabilities of LLMs through strategic prompt optimization and reinforcement learning paradigms can yield substantial advancements. Practically, GenCLS++ facilitates scalable improvements as LLMs evolve, allowing classifications to adapt seamlessly to new data categories without extensive retraining or architectural alterations.

From a theoretical perspective, the insights gleaned from GenCLS++ challenge existing presumptions about the necessity of explicit reasoning in classification tasks—a notion that could reshape how LLM strategies are developed for similar tasks.

Looking forward, the paper points to potential generalizations of these findings across different model scales and prompts future work in identifying additional strategies to further harness generative classification performance. The prospect of fusing generative and discriminative approaches broadly in LLM applications remains an intriguing horizon for AI research and development.

This exploration marks a significant stride in advancing the capabilities of LLMs for nuanced text classification tasks, paving the way for current and future applications in AI-driven decision systems.