ADASYN with 1x Multiplication in Credit Scoring
- The paper demonstrates that ADASYN with 1x multiplication significantly improves model performance, with AUC increasing by 0.77% and Gini rising by 3.00%.
- It utilizes a robust experimental framework with stratified splitting, fixed hyperparameters, and bootstrap testing to ensure reproducibility and statistical rigor.
- The study recommends maintaining a 6.6:1 majority-to-minority ratio, highlighting that moderate augmentation outperforms full rebalancing in credit scoring.
ADASYN with 1x Multiplication refers to the application of the Adaptive Synthetic Sampling (ADASYN) algorithm where the minority class is “doubled” (i.e., augmented with an equal number of synthetic minority samples, 1× the original count) rather than fully balanced to match the majority class. The approach is empirically examined in the context of severe class imbalance in credit scoring, with particular attention to the effects of augmentation ratio on learning, generalization, and statistical significance within predictive modeling frameworks.
1. Algorithmic Formulation and Oversampling Principle
ADASYN operates by adaptively generating synthetic minority-class samples with greater density in regions that are difficult to learn. For each minority instance , the algorithm computes a local ratio of majority-class neighbors among its nearest neighbors. This ratio governs the normalized probability that will be selected for synthetic augmentation:
- The total number of synthetic samples to generate, , sets the augmentation factor. In the “1x multiplication” scenario, is chosen to double the minority count, so synthetic samples are produced for each . Synthetics are generated by linear interpolation between and one randomly selected minority neighbor, following the SMOTE construction paradigm.
2. Empirical Findings: Performance Metrics and Law of Diminishing Returns
Comprehensive evaluation is conducted using the Give Me Some Credit dataset (97,243 samples, 7% default rate). ADASYN with 1x multiplication achieves:
- AUC: 0.6778
- Gini coefficient: 0.3557
These values reflect statistically significant improvements over the unbalanced baseline (AUC = 0.6727, Gini = 0.3453), with relative increases of +0.77% for AUC and +3.00% for Gini ( = 0.017, bootstrap test; = 1,000 resamples). It is also found that higher augmentation factors (2x, 3x) reduce performance, with 3x oversampling producing a –0.48% absolute AUC decrement over baseline. This behavior manifests as a clear “inverted-U” relationship between the multiplication factor and classifier efficacy, strongly indicating a law of diminishing returns for synthetic oversampling.
3. Optimal Class Ratio and Its Implications
Contrary to prevalent practice, optimal classifier performance is realized at a majority-to-minority ratio of 6.6:1. Full balancing to 1:1 is not supported empirically; instead, moderate imbalance (minority class doubled) yields the highest test set generalization and predictive discrimination for rare events. This challenges the common heuristic of setting synthetic sample count equal to the majority class, and instead supports targeted moderate augmentation.
4. Controlled Experimental Framework: Robustness and Statistical Rigor
The paper ensures methodological rigor through:
- Stratified 70-30 train-test splitting (preserving class imbalance in the test set)
- Application of synthetic augmentation strictly to the training subset
- Use of fixed XGBoost hyperparameters (max_depth=6, learning_rate=0.1, n_estimators=100)
- Seed control (seed=42) for reproducibility across random operations
- Bootstrap statistical testing (1,000 iterations) to derive -values and confidence intervals
Quality of synthetics is evaluated with distributional metrics such as Kolmogorov-Smirnov statistics, Wasserstein and Jensen-Shannon divergences, confirming that the augmented data maintains statistical similarity with the original minority distribution.
5. Practitioner Guidelines and Modeling Recommendations
The empirical results yield several actionable recommendations:
- Utilize ADASYN with 1x multiplication for datasets exhibiting ~7% minority incidence, specifically avoiding more aggressive oversampling.
- Target a post-augmentation class ratio near 6.6:1 for optimal performance.
- Eschew extensive grid search or aggressive hyperparameter tuning, as these may precipitate overfitting in oversampled regimes.
- Forego mixing multiple synthetic augmentation strategies; instead, implement the single best method as determined empirically.
6. Domain Scope, Limitations, and Reproducibility
While results are demonstrated on a single real-world credit scoring dataset, the methodology provides a reproducible framework for similar imbalanced learning contexts. The restricted domain implies that prior to broad deployment, analogous studies are advisable in other applications of minority oversampling. The paper’s guidelines are contingent on the raw pre-augmentation imbalance and may not translate without adjustment to domains with markedly different incidence rates.
7. Significance and Field Impact
ADASYN with 1x multiplication highlights the importance of empirically calibrating synthetic augmentation strategies, particularly in domains—such as credit scoring—where over-augmentation can obscure relevant decision boundaries or induce noise. The finding that moderate imbalance is preferable provides a direct challenge to intuitive but unsubstantiated practices of 1:1 balancing, establishing a strong precedent for nuanced design of imbalanced learning pipelines. The paper defines the “sweet spot” for oversampling, supporting robust model generalization while leveraging adaptive sample generation.
| Multiplication Factor | Post-augmentation Ratio | AUC | Gini | Relative AUC Change | Stat. Significance (p) |
|---|---|---|---|---|---|
| 0x (Baseline) | 13.3:1 | 0.6727 | 0.3453 | Reference | — |
| 1x (Optimal) | 6.6:1 | 0.6778 | 0.3557 | +0.77% | 0.017 |
| 2x | 4.4:1 | Lower | Lower | Negative | — |
| 3x | 3.3:1 | –0.48% | — | Decrease | — |
In summary, ADASYN with 1x multiplication (doubling the minority class) provides a robust, statistically justified enhancement in imbalanced credit scoring. The approach balances effective augmentation without performance loss, establishes a practical majority-to-minority ratio of approximately 6.6:1, and can be operationalized through reproducibly designed experimental pipelines (Chia, 21 Oct 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free