Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 96 tok/s
Gemini 3.0 Pro 48 tok/s Pro
Gemini 2.5 Flash 155 tok/s Pro
Kimi K2 197 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

ADASYN with 1x Multiplication in Credit Scoring

Updated 28 October 2025
  • The paper demonstrates that ADASYN with 1x multiplication significantly improves model performance, with AUC increasing by 0.77% and Gini rising by 3.00%.
  • It utilizes a robust experimental framework with stratified splitting, fixed hyperparameters, and bootstrap testing to ensure reproducibility and statistical rigor.
  • The study recommends maintaining a 6.6:1 majority-to-minority ratio, highlighting that moderate augmentation outperforms full rebalancing in credit scoring.

ADASYN with 1x Multiplication refers to the application of the Adaptive Synthetic Sampling (ADASYN) algorithm where the minority class is “doubled” (i.e., augmented with an equal number of synthetic minority samples, 1× the original count) rather than fully balanced to match the majority class. The approach is empirically examined in the context of severe class imbalance in credit scoring, with particular attention to the effects of augmentation ratio on learning, generalization, and statistical significance within predictive modeling frameworks.

1. Algorithmic Formulation and Oversampling Principle

ADASYN operates by adaptively generating synthetic minority-class samples with greater density in regions that are difficult to learn. For each minority instance xix_i, the algorithm computes a local ratio rir_i of majority-class neighbors among its kk nearest neighbors. This ratio governs the normalized probability fif_i that xix_i will be selected for synthetic augmentation:

  • ri=number of majority neighbors within kNNkr_i = \frac{\text{number of majority neighbors within kNN}}{k}
  • fi=rijrjf_i = \frac{r_i}{\sum_j r_j} The total number of synthetic samples to generate, GG, sets the augmentation factor. In the “1x multiplication” scenario, GG is chosen to double the minority count, so gi=fiGg_i = f_i \cdot G synthetic samples are produced for each xix_i. Synthetics are generated by linear interpolation between xix_i and one randomly selected minority neighbor, following the SMOTE construction paradigm.

2. Empirical Findings: Performance Metrics and Law of Diminishing Returns

Comprehensive evaluation is conducted using the Give Me Some Credit dataset (97,243 samples, 7% default rate). ADASYN with 1x multiplication achieves:

  • AUC: 0.6778
  • Gini coefficient: 0.3557

These values reflect statistically significant improvements over the unbalanced baseline (AUC = 0.6727, Gini = 0.3453), with relative increases of +0.77% for AUC and +3.00% for Gini (pp = 0.017, bootstrap test; NN = 1,000 resamples). It is also found that higher augmentation factors (2x, 3x) reduce performance, with 3x oversampling producing a –0.48% absolute AUC decrement over baseline. This behavior manifests as a clear “inverted-U” relationship between the multiplication factor and classifier efficacy, strongly indicating a law of diminishing returns for synthetic oversampling.

3. Optimal Class Ratio and Its Implications

Contrary to prevalent practice, optimal classifier performance is realized at a majority-to-minority ratio of 6.6:1. Full balancing to 1:1 is not supported empirically; instead, moderate imbalance (minority class doubled) yields the highest test set generalization and predictive discrimination for rare events. This challenges the common heuristic of setting synthetic sample count equal to the majority class, and instead supports targeted moderate augmentation.

4. Controlled Experimental Framework: Robustness and Statistical Rigor

The paper ensures methodological rigor through:

  • Stratified 70-30 train-test splitting (preserving class imbalance in the test set)
  • Application of synthetic augmentation strictly to the training subset
  • Use of fixed XGBoost hyperparameters (max_depth=6, learning_rate=0.1, n_estimators=100)
  • Seed control (seed=42) for reproducibility across random operations
  • Bootstrap statistical testing (1,000 iterations) to derive pp-values and confidence intervals

Quality of synthetics is evaluated with distributional metrics such as Kolmogorov-Smirnov statistics, Wasserstein and Jensen-Shannon divergences, confirming that the augmented data maintains statistical similarity with the original minority distribution.

5. Practitioner Guidelines and Modeling Recommendations

The empirical results yield several actionable recommendations:

  • Utilize ADASYN with 1x multiplication for datasets exhibiting ~7% minority incidence, specifically avoiding more aggressive oversampling.
  • Target a post-augmentation class ratio near 6.6:1 for optimal performance.
  • Eschew extensive grid search or aggressive hyperparameter tuning, as these may precipitate overfitting in oversampled regimes.
  • Forego mixing multiple synthetic augmentation strategies; instead, implement the single best method as determined empirically.

6. Domain Scope, Limitations, and Reproducibility

While results are demonstrated on a single real-world credit scoring dataset, the methodology provides a reproducible framework for similar imbalanced learning contexts. The restricted domain implies that prior to broad deployment, analogous studies are advisable in other applications of minority oversampling. The paper’s guidelines are contingent on the raw pre-augmentation imbalance and may not translate without adjustment to domains with markedly different incidence rates.

7. Significance and Field Impact

ADASYN with 1x multiplication highlights the importance of empirically calibrating synthetic augmentation strategies, particularly in domains—such as credit scoring—where over-augmentation can obscure relevant decision boundaries or induce noise. The finding that moderate imbalance is preferable provides a direct challenge to intuitive but unsubstantiated practices of 1:1 balancing, establishing a strong precedent for nuanced design of imbalanced learning pipelines. The paper defines the “sweet spot” for oversampling, supporting robust model generalization while leveraging adaptive sample generation.

Multiplication Factor Post-augmentation Ratio AUC Gini Relative AUC Change Stat. Significance (p)
0x (Baseline) 13.3:1 0.6727 0.3453 Reference
1x (Optimal) 6.6:1 0.6778 0.3557 +0.77% 0.017
2x 4.4:1 Lower Lower Negative
3x 3.3:1 –0.48% Decrease

In summary, ADASYN with 1x multiplication (doubling the minority class) provides a robust, statistically justified enhancement in imbalanced credit scoring. The approach balances effective augmentation without performance loss, establishes a practical majority-to-minority ratio of approximately 6.6:1, and can be operationalized through reproducibly designed experimental pipelines (Chia, 21 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to ADASYN with 1x Multiplication.