AxelSMOTE: Agent-Based Synthetic Oversampling
- AxelSMOTE is an agent-based oversampling technique that addresses class imbalance by modeling data points as autonomous agents interacting via trait exchanges.
- It partitions features into trait groups and employs similarity-controlled probabilistic exchanges and Beta-distributed blending for realistic synthetic sample generation.
- Experimental results show improved F1-scores and balanced accuracy over traditional methods while preserving intra-class correlations and data diversity.
AxelSMOTE is an agent-based oversampling algorithm addressing the class imbalance problem in machine learning by integrating principles from social dynamics into sample synthesis. Departing from traditional, feature-wise interpolation approaches, AxelSMOTE models each data point as an autonomous agent that interacts with its peers, enabling a multidimensional, context-aware generation of synthetic minority samples. Key innovations include trait-based feature grouping to maintain intra-feature correlations, a similarity-controlled probabilistic exchange process inspired by Axelrod’s model of cultural dissemination, Beta distribution-based neighbor blending for realistic interpolation, and explicit diversity injection to mitigate overfitting. Experimental evidence demonstrates AxelSMOTE’s superiority over conventional state-of-the-art methods in terms of both performance metrics and preservation of data structure (Kishanthan et al., 8 Sep 2025).
1. Theoretical Foundations and Motivation
AxelSMOTE is motivated by limitations inherent to conventional synthetic over-sampling techniques such as SMOTE and its variants. Standard approaches independently interpolate features between pairs of minority points using a deterministic process, which neglects feature interrelations, provides limited control over sample diversity, and may generate unrealistic or noisy synthetic samples. To overcome these, AxelSMOTE applies Axelrod’s cultural dissemination model, modeling data points as agents that can probabilistically exchange “traits”—coherent groups of related features—subject to neighborhood similarity constraints. This agent-based paradigm is designed to better reproduce the complex structure and variability present in real, under-represented instances.
2. Trait-Based Feature Grouping and Preservation of Correlations
A distinctive feature of AxelSMOTE is its partitioning of the -dimensional feature space into %%%%1%%%% contiguous groups, each treated as a “trait.” For trait , grouping is defined as:
This grouping is crucial for preserving intra-trait correlations during the oversampling process. Rather than blending features individually, entire traits are exchanged or interpolated collectively between compatible agents. This design contrasts sharply with approaches that treat each dimension independently, which can break organic feature dependencies and produce synthetic samples ill-suited for real-world data structure.
3. Similarity-Based Probabilistic Exchange Mechanism
Central to AxelSMOTE’s agent-based synthesis is a similarity-driven, probabilistic trait exchange. For each trait , the trait similarity between two minority samples and is evaluated as:
A candidate trait exchange only occurs if for a chosen similarity threshold , and a uniform random draw is less than the influence rate . These controls ensure that only sufficiently compatible agents participate in trait blending, reducing the risk of generating out-of-distribution synthetic samples (i.e., preserving local manifold structure in high-dimensional spaces).
4. Beta Distribution Blending and Controlled Diversity Injection
Trait-level interpolation in AxelSMOTE employs a stochastic blending coefficient sampled from a symmetric distribution:
This favors interior points (i.e., blends) over endpoints, creating more realistic synthetic instances than uniform random sampling. Post-exchange, controlled diversity is injected by adding Gaussian noise scaled to the trait’s dynamic range:
This explicit variance control enables the exploration of local neighborhoods while mitigating risks of sample collapse or degeneracy, and helps reduce overfitting commonly associated with naive oversampling.
5. Methodological Workflow
The agent-based synthetic generation process consists of:
- Trait Partitioning: Segment feature space into groups.
- Neighborhood Discovery: For each minority class instance , select nearest neighbors from the same class.
- Trait Exchange Iteration:
- For each trait , pick neighbor .
- Compute . If similarity and random influence pass their respective thresholds, exchange occurs.
- Blend trait using Beta-distributed .
- Inject per-feature Gaussian noise to promote diversity.
- Synthetic Sample Creation: Form new sample , concatenate exchanged traits, and repeat to meet desired minority class augmentation targets.
6. Empirical Performance and Efficiency
AxelSMOTE was benchmarked on eight real-world datasets including Wisconsin, Thyroid, KC1, Ads, ILPD, Glass, Page Blocks, and Ecoli. Performance metrics included F1-score and balanced accuracy. On average:
- F1-score increased by approximately 2.37% over traditional SMOTE.
- AxelSMOTE outperformed SVMSMOTE, BorderlineSMOTE, ADASYN, and competitive undersampling methods.
- t-SNE visualizations showed that AxelSMOTE-generated samples produce well-clustered, clearly separated synthetic data, indicating strong preservation of intra-class structure with appropriate diversity.
- Computational cost, while higher than basic SMOTE due to trait partitioning and similarity computations, remains substantially lower than deep generative approaches; runtime was competitive with existing classical oversamplers.
7. Limitations, Parameterization, and Future Directions
A recognized limitation is the increased complexity due to multiple hyperparameters: the number of traits , similarity threshold , influence rate , and the neighborhood size . Current implementation requires empirical tuning. The authors suggest automating parameter selection via data-driven optimization in further work. Another extension area is the adaptation of AxelSMOTE’s agent-based mechanisms to time series and image modalities. This could entail defining trait groupings over temporally or spatially contiguous features and adjusting the similarity metrics accordingly.
Summary Table: AxelSMOTE Innovations and Controls
Component | Mechanism | Purpose |
---|---|---|
Trait grouping | Partition features into contiguous blocks | Preserve intra-trait correlations |
Similarity check | Trait similarity > ; random | Realistic blending, avoid noise |
Blending | Interior interpolation | |
Diversity injection | Add on exchanged traits | Avoid overfitting, promote variety |
AxelSMOTE represents an agent-based, interaction-centric paradigm for synthetic oversampling. Its combination of trait grouping, similarity-driven exchanges, stochastic blending, and explicit diversity ensures the preservation of statistical structure and provides robust minority class augmentation. Empirical evidence supports its effectiveness across diverse imbalanced datasets (Kishanthan et al., 8 Sep 2025). Further advances are anticipated in automated parameter optimization and broader modality adaptation.