Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 82 tok/s

Gemini 2.5 Pro 62 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 35 tok/s Pro

GPT-4o 99 tok/s Pro

Kimi K2 176 tok/s Pro

GPT OSS 120B 448 tok/s Pro

Claude Sonnet 4.5 37 tok/s Pro

2000 character limit reached

One-Pass AUC Optimization (1305.1363v2)

Published 7 May 2013 in cs.LG

Abstract: AUC is an important performance measure and many algorithms have been devoted to AUC optimization, mostly by minimizing a surrogate convex loss on a training data set. In this work, we focus on one-pass AUC optimization that requires only going through the training data once without storing the entire training dataset, where conventional online learning algorithms cannot be applied directly because AUC is measured by a sum of losses defined over pairs of instances from different classes. We develop a regression-based algorithm which only needs to maintain the first and second order statistics of training data in memory, resulting a storage requirement independent from the size of training data. To efficiently handle high dimensional data, we develop a randomized algorithm that approximates the covariance matrices by low rank matrices. We verify, both theoretically and empirically, the effectiveness of the proposed algorithm.

Citations (177)

View on Semantic Scholar

Summary

One-Pass AUC Optimization: A Novel Approach

The paper "One-Pass AUC Optimization" by Gao, Jin, Zhu, and Zhou introduces a new algorithmic framework for optimizing the Area Under the Receiver Operating Characteristic Curve (AUC) in a one-pass manner. AUC is a critical metric for evaluating the performance of binary classification models, particularly in large-scale or streaming data applications where traditional batch processing techniques are impractical due to memory constraints.

Summary of Key Contributions

The authors tackle one-pass AUC optimization by proposing a regression-based algorithm that utilizes the square loss function. This choice of loss function allows for optimal computation by storing only first and second-order statistics of the data, which is particularly beneficial as it decouples the storage needs from the data size, remaining proportional to the square of feature dimensions (O( $d^2$ )).

In handling high-dimensional datasets, the paper further innovates by introducing a randomized algorithm that approximates covariance matrices with low-rank matrices, significantly reducing the computational overhead.

Strong Results and Theoretical Insights

The authors substantiate their approach with both theoretical analysis and empirical results. They establish that their algorithm achieves a convergence rate of O(1/T) in separable cases, and O(1/√T) generally, outperforming existing online AUC optimization methods which are bound by an O(1/√T) convergence rate.

The paper's theoretical contributions include demonstrating the consistency of square loss with traditional AUC and establishing strong convergence guarantees, which underscores their algorithm's reliability and efficiency.

Implications and Future Directions

Practically, the ability to perform AUC optimization in a one-pass, memory-efficient manner opens up significant opportunities for deploying predictive models in scenarios constrained by computational resources or requiring real-time processing, such as IoT applications or real-time analytics.

Theoretically, this work prompts further exploration into optimizing other performance metrics in a one-pass fashion, potentially broadening the application scope of machine learning models in similar resource-constrained environments.

Future research could explore extensions of the one-pass framework presented in this paper to multi-class classification scenarios or investigate recursive approaches for dynamic adaptation in continuously evolving data streams.

In conclusion, this paper makes substantial advancements in AUC optimization under stringent data access conditions and proposes a robust, scalable solution that is both theoretically sound and practically viable.