Papers
Topics
Authors
Recent
Search
2000 character limit reached

Self-Training Neurochaos Learning

Updated 10 January 2026
  • Self-Training Neurochaos Learning is a hybrid semi-supervised framework that fuses chaos-driven feature encoding with iterative pseudo-labeling.
  • It uses chaotic maps to transform features into robust firing-rate representations, capturing latent nonlinear relationships.
  • The iterative self-training with a confidence threshold progressively augments labeled data, yielding superior accuracy in imbalanced scenarios.

Self-Training Neurochaos Learning (NL+ST) is a hybrid semi-supervised learning (SSL) framework that combines Neurochaos Learning (NL)—where chaos-based feature transformations reveal latent nonlinear structure—with threshold-based Self-Training (ST) to exploit large quantities of unlabelled data when only a small fraction of samples are labelled. The approach addresses scenarios in which obtaining labelled data is expensive or challenging, particularly for nonlinear or imbalanced classification tasks. NL+ST integrates robust chaos-driven feature engineering with iterative pseudo-labelling of high-confidence unlabelled points, resulting in superior generalisation and classification accuracy relative to conventional SSL techniques (M et al., 3 Jan 2026).

1. Motivation and Theoretical Foundation

Many practical machine learning applications are characterized by a paucity of labelled data and abundant unlabelled samples. Supervised approaches typically overfit or fail to extrapolate in such conditions, notably when the data exhibits strong nonlinearities or class imbalance. SSL leverages unlabelled examples to address this gap, but popular variants may inadequately capture subtle feature relationships.

Neurochaos Learning transforms each sample’s raw features using chaotic dynamics. The output is a “firing-rate” representation—an embedding encoding the response of chaotic neurons to individual features, designed to be robust under limited supervision and resilient to input noise. By integrating NL with threshold-based ST, the framework both distils complex data structure into noise-resistant representations and iteratively expands the labelled set with reliable pseudo-labels, amplifying supervised signal (M et al., 3 Jan 2026).

2. Neurochaos Learning: Chaotic Feature Encoding

NL implements a three-phase pipeline for each feature xijx_{ij} of input sample ii:

  1. Preprocessing: Each raw feature is scaled to [0,1][0,1].
  2. Chaotic Encoding: A chaotic neuron for each feature is initialized at q[0,1]q\in[0,1]. The chaotic map ff (e.g., skew tent map) is iteratively applied until the trajectory visits the ε\varepsilon-ball about xijx_{ij}:

uij(0)=q,uij(t+1)=f(uij(t))untiluij(t)xij<εu^{(0)}_{ij} = q,\quad u^{(t+1)}_{ij} = f(u^{(t)}_{ij})\quad \text{until}\quad |u^{(t)}_{ij} - x_{ij}| < \varepsilon

The number of required iterations is TijT_{ij}.

  1. Symbolic Encoding and Firing-Rate Extraction: The sequence {uij(t)}t=1Tij\{u^{(t)}_{ij}\}_{t=1}^{T_{ij}} is thresholded at bb to obtain a binary symbolic sequence:

sij(t)={1,uij(t)b 0,uij(t)<bs^{(t)}_{ij} = \begin{cases} 1, & u^{(t)}_{ij}\geq b \ 0, & u^{(t)}_{ij}<b \end{cases}

The firing-rate feature is then:

FRij=1Tijt=1Tijsij(t)\mathrm{FR}_{ij} = \frac{1}{T_{ij}} \sum_{t=1}^{T_{ij}} s^{(t)}_{ij}

Each sample ii is thus encoded as FRi=(FRi1,,FRid)\mathrm{FR}_i = (\mathrm{FR}_{i1}, \dots, \mathrm{FR}_{id}) with FRij[0,1]\mathrm{FR}_{ij}\in[0,1].

NL-generated embeddings have been found to expose nonlinear separabilities obscured in the original feature space and are especially valuable when used with straightforward classifiers in low-data or noisy settings (M et al., 3 Jan 2026).

3. Threshold-Based Self-Training: Pseudo-Label Expansion

Threshold-based Self-Training operates by iteratively augmenting the labelled dataset:

  • Let LL denote the current labelled set (initially 15% of the total), and UU the current unlabelled set (initially 85%).
  • The base classifier CC (Random Forest, AdaBoost, SVM, Logistic Regression, Gaussian Naïve Bayes) is trained on LL.
  • For each xUx\in U, class posteriors p(yx)p(y|x) are predicted. Pseudo-label (x,y^)(x, \hat{y}) is retained if model confidence ρ(x)=maxyp(yx)\rho(x) = \max_y p(y|x) satisfies ρ(x)τ\rho(x)\geq\tau, where τ=0.75\tau=0.75:

P(t)={(x,y^)ρ(x)τ}P^{(t)} = \{ (x, \hat{y}) \mid \rho(x) \ge \tau \}

  • Update LLP(t)L \leftarrow L \cup P^{(t)}, UU{x:(x,)P(t)}U \leftarrow U \setminus \{x:(x,\cdot)\in P^{(t)}\} and iterate. The process terminates when no new high-confidence assignments are made.

This selective, high-confidence assignment reduces the risk of erroneous label propagation and stabilizes training (M et al., 3 Jan 2026).

4. NL+ST Architecture and Implementation Pipeline

NL+ST comprises a full pipeline integrating chaos-based feature encoding and self-training, as formalized below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Input: D = {(x_i)}, labels for 15% of D, map f, b, ε, τ, classifier C
Output: trained classifier C*

1. Scale features to [0,1].
2. For each sample x_i, each feature j:
   a. Set u  q.
   b. Iterate u  f(u) till |u - x_{ij}| < ε, count T_{ij}.
   c. s^{(t)}  [u  b].
   d. FR_{ij}  (1 / T_{ij}) * sum(s^{(t)}).
3. FR_i = [FR_{i1},...,FR_{id}]
4. Partition FR-data: L (15% labelled), U (85% unlabelled).
5. Repeat:
   a. Train C on L.
   b. For x in U: ρ(x) = max_y p(y|x).
   c. P  {(x,ŷ): ρ(x)  τ}.
   d. L  L  P; U  U \ {x | (x,·)  P}.
   Until P is empty.
6. Return final C.

Key hyperparameters are fixed as: b=0.499b=0.499 (maximizes symbolic entropy), ε=0.25\varepsilon = 0.25, τ=0.75\tau = 0.75 (pseudo-labeling threshold), and initial chaotic state qq is optimized via 5-fold cross-validation for each dataset/classifier pair (M et al., 3 Jan 2026).

5. Experimental Evaluation and Performance Analysis

Ten benchmark datasets were employed: Iris, Wine, Breast Cancer Wisconsin, Haberman’s Survival, Ionosphere, Statlog (Heart), Seeds, Palmer Penguins, Pima Indians Diabetes, Glass Identification. Standard protocol: 80%/20% train/test split, with only 15% of the train set labelled (LL), 85% unlabelled (UU). Macro-F1 score measured generalisation under class imbalance.

The NL+ST approach consistently produced higher macro-F1 scores than standalone ST, with especially marked improvements in small, nonlinear, and imbalanced datasets. Notable gains included:

Dataset Classifier NL+ST Performance Gain Over ST
Iris LR +188.66%
Wine LR +158.58%
Glass Identification RF +110.48%

Full results by dataset and base classifier can be found in the master tables of (M et al., 3 Jan 2026).

6. Architectural Significance, Limitations, and Prospective Extensions

Chaos-derived firing-rate features substantially enhance the separability of nonlinear clusters, empowering conventional classifiers in low-label and noisy settings. The ST loop incrementally increases labelled coverage while maintaining strict confidence, mitigating the risk of systematic label error amplification.

Limitations include sensitivity to chaotic map parameters (qq, bb, ε\varepsilon), which necessitates cross-validation and may limit stability. A single confidence threshold (τ)(\tau) may be suboptimal; adaptive or curriculum-based thresholding represents a promising avenue for robustness.

Proposed Extensions encompass:

  • Adaptive/curriculum-based pseudo-labeling for dynamic confidence calibration.
  • Integration of chaos-derived feature embeddings into neural or deep learning pipelines.
  • Unsupervised pretraining with chaotic transformations for tasks such as clustering or anomaly detection (M et al., 3 Jan 2026).

7. Relationship to Broader SSL Research and Application Domains

NL+ST exemplifies advances in SSL that blend principled feature engineering rooted in chaos theory with iterative, high-confidence self-labelling. Its impact is principally pronounced in domains where nonlinear relationships are prevalent and labelled data is scarce—including scientific instrumentation, biomedical diagnostics, and rare-event detection. The methodology also underscores the value of interpretable dynamic representations for resilient, data-efficient learning (M et al., 3 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Self-Training Neurochaos Learning (NL+ST).