Imbalance-XGBoost: Strategies for Imbalanced Data

Updated 22 September 2025

Imbalance-XGBoost is an advanced adaptation of XGBoost featuring tailored loss functions and sampling methods to address severe class imbalance.
It integrates weighted, focal, and asymmetric loss functions to enhance discrimination for minority classes while preserving model scalability.
Best practices include post-split sampling and the use of sensitivity-aware metrics such as F1, MCC, and Precision-Recall for reliable evaluation.

Imbalance-XGBoost refers to a collection of algorithmic strategies, loss functions, software packages, and training methodologies that augment the XGBoost gradient boosting framework for improved learning on datasets with severe class distribution skew. These methods are motivated by empirical findings that standard XGBoost—and tree boosting methods in general—provide suboptimal discrimination for minority classes, particularly when data imbalance is extreme. Imbalance-XGBoost encompasses both principled modifications to the core objective function (weighted or focal losses, as well as recent class-balanced or asymmetric losses) and best practices for data preparation, model evaluation, and robust deployment.

1. Motivation and Problem Definition

Imbalanced classification describes the scenario where one or more classes (often the class of operational interest) are grossly under-represented in the training data. In such settings, standard XGBoost (using default parameters and standard cross-entropy loss) exhibits a performance bias toward the majority class, yielding deceptively high accuracy but markedly poor F₁ scores, Matthews Correlation Coefficient (MCC), and recall for minority outcomes (Wang et al., 2019, Velarde et al., 25 Apr 2025, Velarde et al., 2023). This challenge is profound in applications including fraud detection, medical diagnostics, anomaly detection, and cyber-intrusion, where the minority class carries the most critical operational risk.

Data-level remedies such as naive upsampling or undersampling have been shown to be of limited and sometimes unreliable benefit (Velarde et al., 25 Apr 2025, Velarde et al., 2023). Algorithm-level interventions—such as custom loss functions and tailored boosting procedures—have emerged as more robust and theoretically grounded solutions.

2. Algorithmic Strategies for Addressing Imbalance

Imbalance-XGBoost methods can be categorized into several principal approaches:

Weighted Cross-Entropy Loss: This loss multiplies the positive class contribution by a user-supplied weight $w > 1$ , increasing the learning signal from the minority class:

$L_\text{w} = -\sum_{i=1}^m \left[ w\cdot y_i \log \hat{y}_i + (1 - y_i) \log(1 - \hat{y}_i) \right]$

Implementations modify the calculation of both the gradient and Hessian in XGBoost’s Newton step (Wang et al., 2019, Luo et al., 19 Jul 2024).

Focal Loss: By introducing a focusing parameter $\gamma > 0$ , focal loss downweights easy examples and amplifies the loss for hard (misclassified) instances:

$L_\text{f} = -\sum_{i=1}^m \left[ y_i (1 - \hat{y}_i)^\gamma \log(\hat{y}_i) + (1 - y_i) \hat{y}_i^\gamma \log(1 - \hat{y}_i) \right]$

The expressions for derivatives are non-trivial and require closed-form simplification for efficient tree growth in XGBoost (Wang et al., 2019).

Class-Balanced and Asymmetric Losses: Extensions such as Asymmetric Loss (ASL), Asymmetric Cross-Entropy (ACE), and Asymmetric Weighted Cross-Entropy (AWE) further refine the loss to differentially penalize false positives and false negatives, introducing margin and asymmetry parameters:

| Loss Type | Positive Loss $(l_+)$ | Negative Loss $(l_-)$ | Extra Parameters | |-----------|-------------------------------------------|-------------------------------------------------------|---------------------| | WCE | $w \cdot \log(p)$ | $\log(1-p)$ | weight $w$ | | FL | $(1-p)^\gamma \log(p)$ | $p^\gamma \log(1-p)$ | focusing $\gamma$ | | ASL | $(1-p)^{\gamma_+}\log(p)$ | $p_m^{\gamma_-}\log(1-p_m)$ , $p_m = \max(p-m, 0)$ | margins, $\gamma$ s | | ACE/AWE | $w \cdot \log(p)$ , $\log(p)$ (ACE) | $\log(1-p_m)$ | margins, $w$ |

These techniques have been formally benchmarked across binary, multi-class, and multi-label settings (Luo et al., 19 Jul 2024).

3. Data Handling, Sampling, and Model Evaluation

Empirical studies reveal that the timing and method of applying sampling strategies critically affects both generalizability and validity of performance metrics. Data leakage—arising when synthetic minority samples or balancing is performed before the train-test split—results in artificially inflated performance values (precision, recall, F1, AUC), rendering them unreliable (Kabane, 10 Dec 2024). Only applying sampling (such as SMOTE, Borderline-SMOTE, or generative approaches) on the training set after the split preserves evaluation integrity, albeit with lower but realistic performance (Kabane, 10 Dec 2024, Tur et al., 27 Nov 2024).

Performance on imbalanced data must be reported using sensitivity-aware metrics: F₁, F₂ scores, Precision-Recall curves, and MCC. Use of overall accuracy or ROC/AUC alone is discouraged due to their poor informativeness under severe imbalance (Wang et al., 2019, Velarde et al., 25 Apr 2025, Velarde et al., 2023, Kabane, 10 Dec 2024).

4. Empirical Evaluation and Observed Benefits

Across multiple studies, incorporating class-balanced loss functions or adjusted data pipelines yields substantial improvements in minority-class performance:

Parkinson’s dataset (class ratio 9:1): Weighted-XGBoost and Focal-XGBoost versions improved F₁ and MCC over vanilla XGBoost, which otherwise produces high accuracy via majority-class overprediction (Wang et al., 2019).
Benchmark datasets (imbalance ratios up to 42:1): The benefit of weighted/focal losses grows as imbalance increases (Wang et al., 2019).
Medical imaging (20% positive class): PEP-Net, combining Borderline-SMOTE oversampling, scale_pos_weight adjustment, and deep 3D ResNet feature extraction, achieved accuracy of 94.5% and AUC >0.91—substantially better than deep learning-only baselines (Tur et al., 27 Nov 2024).
Ensemble approaches: In network intrusion detection, two-layer systems using XGBoost for binary identification and multi-class attack categorization (e.g., I-SiamIDS) report improved F₁ and recall for minority attack classes (Bedi et al., 2020).
Credit card fraud (0.172% positive): Rigorous sampling confined to the training set (avoiding pre-split leakage) yields less optimistic but reliable detection rates, while pre-split sampling misleads with near-perfect test metrics (Kabane, 10 Dec 2024).

Nonetheless, not all objective modifications yield improvements. Studies found that focal loss, under some hyperparameter settings and moderate imbalances, can underperform weighted cross-entropy, and the overall benefit sometimes fails to generalize absent careful tuning (Velarde et al., 25 Apr 2025).

5. Practical Implementations and Software Tools

Imbalance-XGBoost now has direct software support: a Python package provides estimator classes compatible with Scikit-Learn pipelines, enabling empirical risk minimization with custom objectives (weighted or focal) and providing explicit derivatives for XGBoost’s training loop (Wang et al., 2019). Additional packages provide plug-ins for class-balanced or asymmetric losses for GBDT methods, including LightGBM and SketchBoost, supporting binary, multi-class, and multi-label configurations (Luo et al., 19 Jul 2024).

The high scalability of XGBoost is retained under these extensions, crucial for large-scale tasks in finance, healthcare, and network security (Wang et al., 2019, Velarde et al., 2023, Tur et al., 27 Nov 2024). However, noted limitations include incomplete support for missing values and imperfect compatibility with all Scikit-Learn workflows in some implementations (Velarde et al., 25 Apr 2025).

6. Best Practices, Limitations, and Future Outlook

Key recommendations for robust Imbalance-XGBoost deployment include:

Loss/Parameter Tuning: Careful search over weighting (α) and focusing (γ) parameters can yield improved minority class recall, but tuning should be constrained for small datasets to avoid overfitting or high variance (Velarde et al., 25 Apr 2025, Wang et al., 2019).
Sampling Practices: All synthetic resampling and SMOTE-style augmentations must be performed post train-test split to avoid data leakage artifacts (Kabane, 10 Dec 2024).
Hyperparameter Optimization: Scale_pos_weight in standard XGBoost is highly effective when tuned, often closing the gap to Imbalance-XGBoost variants in practical scenarios (Velarde et al., 25 Apr 2025). However, hyperoptimization can introduce variance and instability, requiring validation on temporal or distribution-drifted data.
Robustness over Time: Tree-based classifiers are robust to moderate data drift but require retraining if performance decay is detected in live settings (Velarde et al., 25 Apr 2025).
Evaluation Metrics: Use F-scores, MCC, and Precision-Recall curves for model selection and deployment monitoring. Avoid reliance on accuracy for imbalanced regimes (Velarde et al., 2023).

Emerging directions include the integration of deep feature learning directly with XGBoost (e.g., PEP-Net’s 3DResNet+XGBoost cascade (Tur et al., 27 Nov 2024)), further theoretical study and benchmarking of new class-balanced objectives (ASL, ACE, AWE) (Luo et al., 19 Jul 2024), and principled ensemble composition informed by progressive boosting or lexicographic programming (Soleymani et al., 2017, Datta et al., 2017). The field is also moving toward combined algorithmic and data-level remedies for high-dimensional and multi-label tabular tasks (Luo et al., 19 Jul 2024).

7. Comparative Table of Imbalance-XGBoost Loss Objectives

Objective	Mechanism	Primary Use-case	Key Parameters
Weighted CE	Down/upweights loss by class	Binary/multi-class imbalance	α (weight)
Focal Loss	Focuses on hard samples	Severe imbalance/extreme tail	γ (focus param.)
ASL	Asymmetric sample weighting	Control FP/FN separately	γ+, γ−, m
ACE/AWE	Asymmetric & weighted CE	Rare-positive tasks, outliers	w, m

Table summarizes technical features of primary loss functions for Imbalance-XGBoost (Wang et al., 2019, Luo et al., 19 Jul 2024).

In summary, Imbalance-XGBoost comprises a rigorously evaluated set of algorithmic enhancements and best practices for tree boosting under extreme class imbalance. Its empirical advantages are established in multiple risk-sensitive domains, especially when combined with careful metric selection, disciplined evaluation pipelines, and validation against realistic, temporally-shifted data distributions. Recent software developments and theoretical advancements continue to refine and generalize its methodological portfolio.