Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 81 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

AxelSMOTE: Agent-Based Synthetic Oversampling

Updated 15 September 2025
  • AxelSMOTE is an agent-based oversampling technique that addresses class imbalance by modeling data points as autonomous agents interacting via trait exchanges.
  • It partitions features into trait groups and employs similarity-controlled probabilistic exchanges and Beta-distributed blending for realistic synthetic sample generation.
  • Experimental results show improved F1-scores and balanced accuracy over traditional methods while preserving intra-class correlations and data diversity.

AxelSMOTE is an agent-based oversampling algorithm addressing the class imbalance problem in machine learning by integrating principles from social dynamics into sample synthesis. Departing from traditional, feature-wise interpolation approaches, AxelSMOTE models each data point as an autonomous agent that interacts with its peers, enabling a multidimensional, context-aware generation of synthetic minority samples. Key innovations include trait-based feature grouping to maintain intra-feature correlations, a similarity-controlled probabilistic exchange process inspired by Axelrod’s model of cultural dissemination, Beta distribution-based neighbor blending for realistic interpolation, and explicit diversity injection to mitigate overfitting. Experimental evidence demonstrates AxelSMOTE’s superiority over conventional state-of-the-art methods in terms of both performance metrics and preservation of data structure (Kishanthan et al., 8 Sep 2025).

1. Theoretical Foundations and Motivation

AxelSMOTE is motivated by limitations inherent to conventional synthetic over-sampling techniques such as SMOTE and its variants. Standard approaches independently interpolate features between pairs of minority points using a deterministic process, which neglects feature interrelations, provides limited control over sample diversity, and may generate unrealistic or noisy synthetic samples. To overcome these, AxelSMOTE applies Axelrod’s cultural dissemination model, modeling data points as agents that can probabilistically exchange “traits”—coherent groups of related features—subject to neighborhood similarity constraints. This agent-based paradigm is designed to better reproduce the complex structure and variability present in real, under-represented instances.

2. Trait-Based Feature Grouping and Preservation of Correlations

A distinctive feature of AxelSMOTE is its partitioning of the dd-dimensional feature space into %%%%1%%%% contiguous groups, each treated as a “trait.” For trait TjT_j, grouping is defined as:

Tj={d(j1)d/t+1,,dmin(jd/t,d)}T_j = \{ d_{(j-1) \lfloor d/t \rfloor + 1}, \dots, d_{\min(j \lfloor d/t \rfloor, d)} \}

This grouping is crucial for preserving intra-trait correlations during the oversampling process. Rather than blending features individually, entire traits are exchanged or interpolated collectively between compatible agents. This design contrasts sharply with approaches that treat each dimension independently, which can break organic feature dependencies and produce synthetic samples ill-suited for real-world data structure.

3. Similarity-Based Probabilistic Exchange Mechanism

Central to AxelSMOTE’s agent-based synthesis is a similarity-driven, probabilistic trait exchange. For each trait TjT_j, the trait similarity between two minority samples xix_i and xlx_l is evaluated as:

simj(xi,xl)=11TjqTjxi(q)xl(q)\text{sim}_j(x_i, x_l) = 1 - \frac{1}{|T_j|} \sum_{q \in T_j} |x_i^{(q)} - x_l^{(q)}|

A candidate trait exchange only occurs if simj(xb,xn)>θ\text{sim}_j(x_b, x_n) > \theta for a chosen similarity threshold θ\theta, and a uniform random draw is less than the influence rate α\alpha. These controls ensure that only sufficiently compatible agents participate in trait blending, reducing the risk of generating out-of-distribution synthetic samples (i.e., preserving local manifold structure in high-dimensional spaces).

4. Beta Distribution Blending and Controlled Diversity Injection

Trait-level interpolation in AxelSMOTE employs a stochastic blending coefficient sampled from a symmetric Beta(2,2)\text{Beta}(2, 2) distribution:

x^(p)=λxb(p)+(1λ)xn(p),λBeta(2,2),pTj\hat{x}^{(p)} = \lambda x_b^{(p)} + (1 - \lambda) x_n^{(p)}, \quad \lambda \sim \text{Beta}(2,2),\,\, p \in T_j

This favors interior points (i.e., blends) over endpoints, creating more realistic synthetic instances than uniform random sampling. Post-exchange, controlled diversity is injected by adding Gaussian noise scaled to the trait’s dynamic range:

x^(p)=x^(p)+0.05Rpεp,Rp=maxxScx(p)minxScx(p),εpN(0,1)\hat{x}^{(p)} = \hat{x}^{(p)} + 0.05\,R_p\,\varepsilon_p, \quad R_p = \max_{x \in S_c} x^{(p)} - \min_{x \in S_c} x^{(p)},\,\, \varepsilon_p \sim \mathcal{N}(0,1)

This explicit variance control enables the exploration of local neighborhoods while mitigating risks of sample collapse or degeneracy, and helps reduce overfitting commonly associated with naive oversampling.

5. Methodological Workflow

The agent-based synthetic generation process consists of:

  1. Trait Partitioning: Segment feature space into tt groups.
  2. Neighborhood Discovery: For each minority class instance xbx_b, select kk nearest neighbors Nk(xb)\mathcal{N}_k(x_b) from the same class.
  3. Trait Exchange Iteration:
    • For each trait TjT_j, pick neighbor xnNk(xb)x_n \in \mathcal{N}_k(x_b).
    • Compute simj(xb,xn)\text{sim}_j(x_b, x_n). If similarity and random influence pass their respective thresholds, exchange occurs.
    • Blend trait using Beta-distributed λ\lambda.
    • Inject per-feature Gaussian noise to promote diversity.
  4. Synthetic Sample Creation: Form new sample x^\hat{x}, concatenate exchanged traits, and repeat to meet desired minority class augmentation targets.

6. Empirical Performance and Efficiency

AxelSMOTE was benchmarked on eight real-world datasets including Wisconsin, Thyroid, KC1, Ads, ILPD, Glass, Page Blocks, and Ecoli. Performance metrics included F1-score and balanced accuracy. On average:

  • F1-score increased by approximately 2.37% over traditional SMOTE.
  • AxelSMOTE outperformed SVMSMOTE, BorderlineSMOTE, ADASYN, and competitive undersampling methods.
  • t-SNE visualizations showed that AxelSMOTE-generated samples produce well-clustered, clearly separated synthetic data, indicating strong preservation of intra-class structure with appropriate diversity.
  • Computational cost, while higher than basic SMOTE due to trait partitioning and similarity computations, remains substantially lower than deep generative approaches; runtime was competitive with existing classical oversamplers.

7. Limitations, Parameterization, and Future Directions

A recognized limitation is the increased complexity due to multiple hyperparameters: the number of traits tt, similarity threshold θ\theta, influence rate α\alpha, and the neighborhood size kk. Current implementation requires empirical tuning. The authors suggest automating parameter selection via data-driven optimization in further work. Another extension area is the adaptation of AxelSMOTE’s agent-based mechanisms to time series and image modalities. This could entail defining trait groupings over temporally or spatially contiguous features and adjusting the similarity metrics accordingly.

Summary Table: AxelSMOTE Innovations and Controls

Component Mechanism Purpose
Trait grouping Partition features into tt contiguous blocks Preserve intra-trait correlations
Similarity check Trait similarity > θ\theta; random α\alpha Realistic blending, avoid noise
Blending λBeta(2,2)\lambda \sim \text{Beta}(2,2) Interior interpolation
Diversity injection Add 0.05Rpεp0.05 R_p \varepsilon_p on exchanged traits Avoid overfitting, promote variety

AxelSMOTE represents an agent-based, interaction-centric paradigm for synthetic oversampling. Its combination of trait grouping, similarity-driven exchanges, stochastic blending, and explicit diversity ensures the preservation of statistical structure and provides robust minority class augmentation. Empirical evidence supports its effectiveness across diverse imbalanced datasets (Kishanthan et al., 8 Sep 2025). Further advances are anticipated in automated parameter optimization and broader modality adaptation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to AxelSMOTE.