Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Infinite random forests for imbalanced classification tasks (2408.01777v2)

Published 3 Aug 2024 in math.ST and stat.TH

Abstract: We study predictive probability inference in classification tasks using random forests under class imbalance. We focus on two simplified variants of Breiman's algorithm, namely subsampling Infinite Random Forests (IRFs) and under-sampling IRFs, and establish their asymptotic normality. In the under-sampling setting, training data from both classes are resampled to achieve balance, which enhances minority class representation but introduces a biased model. To correct this, we propose a debiasing procedure based on Importance Sampling (IS) using odds ratios. We instantiate our results using 1-Nearest Neighbor (1-NN) classifiers as base learners in the IRFs and prove the nearly minimax optimality of the approach for Lipschitz continuous objectives. We also show that the IS bagged 1-NN estimator matches the convergence rate of its subsampled counterpart while attaining lower asymptotic variance in most cases. Our theoretical findings are supported by simulation studies, highlighting the empirical benefits of the proposed approach.

Summary

We haven't generated a summary for this paper yet.