ADASYN with 1x Multiplication in Credit Scoring

Updated 28 October 2025

The paper demonstrates that ADASYN with 1x multiplication significantly improves model performance, with AUC increasing by 0.77% and Gini rising by 3.00%.
It utilizes a robust experimental framework with stratified splitting, fixed hyperparameters, and bootstrap testing to ensure reproducibility and statistical rigor.
The study recommends maintaining a 6.6:1 majority-to-minority ratio, highlighting that moderate augmentation outperforms full rebalancing in credit scoring.

ADASYN with 1x Multiplication refers to the application of the Adaptive Synthetic Sampling (ADASYN) algorithm where the minority class is “doubled” (i.e., augmented with an equal number of synthetic minority samples, 1× the original count) rather than fully balanced to match the majority class. The approach is empirically examined in the context of severe class imbalance in credit scoring, with particular attention to the effects of augmentation ratio on learning, generalization, and statistical significance within predictive modeling frameworks.

1. Algorithmic Formulation and Oversampling Principle

ADASYN operates by adaptively generating synthetic minority-class samples with greater density in regions that are difficult to learn. For each minority instance $x_i$ , the algorithm computes a local ratio $r_i$ of majority-class neighbors among its $k$ nearest neighbors. This ratio governs the normalized probability $f_i$ that $x_i$ will be selected for synthetic augmentation:

$r_i = \frac{\text{number of majority neighbors within kNN}}{k}$
$f_i = \frac{r_i}{\sum_j r_j}$ The total number of synthetic samples to generate, $G$ , sets the augmentation factor. In the “1x multiplication” scenario, $G$ is chosen to double the minority count, so $g_i = f_i \cdot G$ synthetic samples are produced for each $x_i$ . Synthetics are generated by linear interpolation between $x_i$ and one randomly selected minority neighbor, following the SMOTE construction paradigm.

2. Empirical Findings: Performance Metrics and Law of Diminishing Returns

Comprehensive evaluation is conducted using the Give Me Some Credit dataset (97,243 samples, 7% default rate). ADASYN with 1x multiplication achieves:

AUC: 0.6778
Gini coefficient: 0.3557

These values reflect statistically significant improvements over the unbalanced baseline (AUC = 0.6727, Gini = 0.3453), with relative increases of +0.77% for AUC and +3.00% for Gini ( $p$ = 0.017, bootstrap test; $N$ = 1,000 resamples). It is also found that higher augmentation factors (2x, 3x) reduce performance, with 3x oversampling producing a –0.48% absolute AUC decrement over baseline. This behavior manifests as a clear “inverted-U” relationship between the multiplication factor and classifier efficacy, strongly indicating a law of diminishing returns for synthetic oversampling.

3. Optimal Class Ratio and Its Implications

Contrary to prevalent practice, optimal classifier performance is realized at a majority-to-minority ratio of 6.6:1. Full balancing to 1:1 is not supported empirically; instead, moderate imbalance (minority class doubled) yields the highest test set generalization and predictive discrimination for rare events. This challenges the common heuristic of setting synthetic sample count equal to the majority class, and instead supports targeted moderate augmentation.

4. Controlled Experimental Framework: Robustness and Statistical Rigor

The paper ensures methodological rigor through:

Stratified 70-30 train-test splitting (preserving class imbalance in the test set)
Application of synthetic augmentation strictly to the training subset
Use of fixed XGBoost hyperparameters (max_depth=6, learning_rate=0.1, n_estimators=100)
Seed control (seed=42) for reproducibility across random operations
Bootstrap statistical testing (1,000 iterations) to derive $p$ -values and confidence intervals

Quality of synthetics is evaluated with distributional metrics such as Kolmogorov-Smirnov statistics, Wasserstein and Jensen-Shannon divergences, confirming that the augmented data maintains statistical similarity with the original minority distribution.

5. Practitioner Guidelines and Modeling Recommendations

The empirical results yield several actionable recommendations:

Utilize ADASYN with 1x multiplication for datasets exhibiting ~7% minority incidence, specifically avoiding more aggressive oversampling.
Target a post-augmentation class ratio near 6.6:1 for optimal performance.
Eschew extensive grid search or aggressive hyperparameter tuning, as these may precipitate overfitting in oversampled regimes.
Forego mixing multiple synthetic augmentation strategies; instead, implement the single best method as determined empirically.

6. Domain Scope, Limitations, and Reproducibility

While results are demonstrated on a single real-world credit scoring dataset, the methodology provides a reproducible framework for similar imbalanced learning contexts. The restricted domain implies that prior to broad deployment, analogous studies are advisable in other applications of minority oversampling. The paper’s guidelines are contingent on the raw pre-augmentation imbalance and may not translate without adjustment to domains with markedly different incidence rates.

7. Significance and Field Impact

ADASYN with 1x multiplication highlights the importance of empirically calibrating synthetic augmentation strategies, particularly in domains—such as credit scoring—where over-augmentation can obscure relevant decision boundaries or induce noise. The finding that moderate imbalance is preferable provides a direct challenge to intuitive but unsubstantiated practices of 1:1 balancing, establishing a strong precedent for nuanced design of imbalanced learning pipelines. The paper defines the “sweet spot” for oversampling, supporting robust model generalization while leveraging adaptive sample generation.

Multiplication Factor	Post-augmentation Ratio	AUC	Gini	Relative AUC Change	Stat. Significance (p)
0x (Baseline)	13.3:1	0.6727	0.3453	Reference	—
1x (Optimal)	6.6:1	0.6778	0.3557	+0.77%	0.017
2x	4.4:1	Lower	Lower	Negative	—
3x	3.3:1	–0.48%	—	Decrease	—

In summary, ADASYN with 1x multiplication (doubling the minority class) provides a robust, statistically justified enhancement in imbalanced credit scoring. The approach balances effective augmentation without performance loss, establishes a practical majority-to-minority ratio of approximately 6.6:1, and can be operationalized through reproducibly designed experimental pipelines (Chia, 21 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

Finding the Sweet Spot: Optimal Data Augmentation Ratio for Imbalanced Credit Scoring Using ADASYN (2025)

Follow Topic

Get notified by email when new papers are published related to ADASYN with 1x Multiplication.