Papers
Topics
Authors
Recent
Search
2000 character limit reached

CSST: Cross-Subject Self-Training

Updated 4 July 2026
  • Cross-Subject Self-Training (CSST) is a self-adaptation strategy that uses pseudo-labels to bridge distribution gaps between source and target subjects.
  • It integrates techniques like active learning, feature alignment, and contrastive learning to manage label scarcity and subject variability.
  • Empirical results in HAR and SSVEP settings demonstrate that CSST achieves near upper-bound performance with minimal target-domain labelled data.

Searching arXiv for the cited CSST-related papers and close context. Cross-Subject Self-Training (CSST) denotes a class of cross-subject adaptation procedures in which a model trained on source subjects, or inherited from a previous iteration, generates pseudo-labels on a target subject and then uses those pseudo-labels to improve target-domain performance. In recent arXiv usage, the term appears in at least two distinct but related forms: as the self-training module inside ActiveSelfHAR for cross-subject human activity recognition (HAR), and as the name of a full cross-subject domain adaptation framework for steady-state visually evoked potential (SSVEP) classification. In both cases, CSST addresses subject-dependent distribution shift and the scarcity or cost of target-domain annotation, but the concrete mechanisms differ substantially (Wei et al., 2023, Wang et al., 29 Jan 2026).

1. Problem setting and terminological scope

The common problem addressed by CSST is cross-subject transfer: a model is trained using data from one set of subjects and then adapted to a new subject whose data distribution differs from the source distribution. In the HAR setting, this is described as the “cross-subject issue when adapting to new users,” which hinders real-world deployment despite strong laboratory performance (Wei et al., 2023). In the SSVEP setting, the source domain is defined as

DS={(xis,yis)}i=1ns,\mathcal D_S = \{(x_i^s, y_i^s)\}_{i=1}^{n_s},

with labeled trials xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P} and labels yis{1,,M}y_i^s\in\{1,\dots,M\}, while the target domain is

DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},

with unlabeled trials xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}, under the assumption that PS(x)PT(x)P_S(x)\neq P_T(x). The learning objective is to minimize the target error

ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]

using labeled source data and unlabeled target data (Wang et al., 29 Jan 2026).

Within this scope, CSST should not be treated as a single canonical algorithm. In ActiveSelfHAR, it is a module interleaved with active learning and neighbor-based augmentation. In SSVEP classification, it is the central framework, comprising Pre-Training with Adversarial Learning (PTAL), Dual-Ensemble Self-Training (DEST), and Time-Frequency Augmented Contrastive Learning (TFA-CL). This suggests that “CSST” functions more as a methodological label for self-training-centered cross-subject adaptation than as a uniquely standardized architecture.

2. CSST in ActiveSelfHAR for cross-subject HAR

In ActiveSelfHAR, the CSST module begins from a teacher network with logits fθ(x)RKf_\theta(x)\in\mathbb{R}^K and class probabilities

pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.

The confidence score is defined as

conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).

Given a threshold xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}0, such as xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}1 for EMG data and xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}2 for IMU data, incremented by xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}3 each iteration, a target-domain sample is included in the self-training set if xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}4, with pseudo-label

xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}5

At iteration xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}6, with unlabeled target pool xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}7 and model parameters xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}8, the self-training set is formed from those xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}9 whose maximum class probability under yis{1,,M}y_i^s\in\{1,\dots,M\}0 exceeds the iteration-specific threshold yis{1,,M}y_i^s\in\{1,\dots,M\}1 (Wei et al., 2023).

The method then extracts penultimate-layer feature vectors yis{1,,M}y_i^s\in\{1,\dots,M\}2 and reduces them via PCA to 3D. For each class yis{1,,M}y_i^s\in\{1,\dots,M\}3, it defines a class center yis{1,,M}y_i^s\in\{1,\dots,M\}4 as the sample in the pseudo-labeled set whose feature is closest to the classwise mean feature among samples with pseudo-label yis{1,,M}y_i^s\in\{1,\dots,M\}5. The collection of these centers forms the center set yis{1,,M}y_i^s\in\{1,\dots,M\}6. The remaining unlabeled pool is

yis{1,,M}y_i^s\in\{1,\dots,M\}7

For each sample in yis{1,,M}y_i^s\in\{1,\dots,M\}8, distances to the two nearest class centers are computed, and informativeness is scored using those distances. Samples are grouped according to which pair of centers they lie between. From each of the yis{1,,M}y_i^s\in\{1,\dots,M\}9 boundary groups, the top-DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},0 most informative points are selected for true-label querying; an example given is DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},1 per group.

A queried point DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},2 is then used to recruit spatio-temporal neighbors. The neighbor score is

DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},3

where DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},4 is the window timestamp and DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},5 is a small time window, for example DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},6. All DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},7 with DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},8 inherit the label of DT={xjt}j=1nt,\mathcal D_T = \{x_j^t\}_{j=1}^{n_t},9. The resulting augmented core set xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}0 is the union of queried samples and these neighbors.

The full algorithm alternates pseudo-labeling, center computation, active querying, core-set augmentation, and fine-tuning. “Update” is defined as fine-tuning only the fully-connected layers while freezing shared CNN layers. The student model at iteration xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}1 is trained on xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}2 using the task loss

xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}3

3. CSST in SSVEP classification: FBEA, PTAL, DEST, and TFA-CL

In the SSVEP formulation, CSST is a two-stage cross-subject domain adaptation framework built on self-training. It is preceded by Filter-Bank Euclidean Alignment (FBEA), which exploits SSVEP frequency information. Each trial is decomposed into xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}4 sub-bands, giving

xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}5

After reshaping to xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}6, the covariance is

xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}7

The reference covariance is

xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}8

and alignment is performed by

xjtRNC×NPx_j^t\in\mathbb{R}^{N_C\times N_P}9

The stated purpose is to reduce inter-subject distributional shift while preserving cross-band correlations (Wang et al., 29 Jan 2026).

The first stage, PTAL, uses a feature extractor PS(x)PT(x)P_S(x)\neq P_T(x)0, classifier PS(x)PT(x)P_S(x)\neq P_T(x)1, and domain discriminator PS(x)PT(x)P_S(x)\neq P_T(x)2, with a Gradient-Reversal Layer between PS(x)PT(x)P_S(x)\neq P_T(x)3 and PS(x)PT(x)P_S(x)\neq P_T(x)4. The supervised source loss is

PS(x)PT(x)P_S(x)\neq P_T(x)5

where

PS(x)PT(x)P_S(x)\neq P_T(x)6

The adversarial loss is

PS(x)PT(x)P_S(x)\neq P_T(x)7

minimized with respect to PS(x)PT(x)P_S(x)\neq P_T(x)8 and maximized with respect to PS(x)PT(x)P_S(x)\neq P_T(x)9. The overall pre-training objective is

ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]0

The second stage, DEST, instantiates two copies of ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]1: a student with parameters ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]2, updated by gradient descent, and a teacher with parameters ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]3, updated by exponential moving average,

ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]4

For each target trial ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]5, three views are formed: the original ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]6 and two augmentations ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]7. A projection head ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]8 produces embeddings

ET(f)=ExPT[1{f(x)y}]\mathcal{E}_T(f)=\mathbb{E}_{x\sim P_T}\bigl[\mathbf{1}\{f(x)\neq y^\star\}\bigr]9

Predicted one-hot labels are

fθ(x)RKf_\theta(x)\in\mathbb{R}^K0

They are fused with cosine-similarity weights

fθ(x)RKf_\theta(x)\in\mathbb{R}^K1

Only pseudo-labels whose top score exceeds confidence threshold fθ(x)RKf_\theta(x)\in\mathbb{R}^K2 are retained. The target self-training loss is

fθ(x)RKf_\theta(x)\in\mathbb{R}^K3

TFA-CL augments each pseudo-labeled target sample along the temporal axis, using jitter and cropping, and along the frequency axis, using additive noise in sub-bands. For a batch of fθ(x)RKf_\theta(x)\in\mathbb{R}^K4 augmented embeddings fθ(x)RKf_\theta(x)\in\mathbb{R}^K5, with temperature fθ(x)RKf_\theta(x)\in\mathbb{R}^K6, the set of positives for anchor fθ(x)RKf_\theta(x)\in\mathbb{R}^K7 is

fθ(x)RKf_\theta(x)\in\mathbb{R}^K8

The supervised contrastive loss for anchor fθ(x)RKf_\theta(x)\in\mathbb{R}^K9 is

pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.0

The total self-training objective is

pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.1

4. Shared principles and major divergences

Both instantiations of CSST are organized around the same central operation: pseudo-labeling of unlabeled target-domain data under a confidence criterion. In ActiveSelfHAR, pseudo-labels are produced by the model trained in the previous iteration or the source domain, and samples are admitted according to pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.2 (Wei et al., 2023). In the SSVEP framework, pseudo-labels are produced from three views and accepted only when the fused prediction exceeds pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.3 (Wang et al., 29 Jan 2026). In both cases, the method assumes that high-confidence predictions are sufficiently reliable to seed further target adaptation.

The principal divergence lies in how each method treats the unlabeled remainder and the role of supervision. ActiveSelfHAR is explicitly hybrid: it combines self-training with active learning, queries true labels for ambiguous target samples, and propagates these labels through spatio-temporal grouping. The SSVEP CSST framework is instead built around source-supervised pre-training, unsupervised target pseudo-labeling, teacher-student refinement, and contrastive regularization. It does not include a human-in-the-loop querying stage.

The feature-space machinery also differs. ActiveSelfHAR constructs class centers from pseudo-labeled target features, uses nearest and second-nearest centers to identify boundary regions, and enlarges queried sets through local spatio-temporal structure. The SSVEP framework performs covariance-based alignment before training, adversarial domain confusion during pre-training, multi-view pseudo-label fusion in DEST, and supervised contrastive learning on pseudo-labeled target embeddings. This suggests that CSST is best understood as a self-training core that can be embedded in substantially different adaptation pipelines.

A common misconception would be to interpret CSST as necessarily label-free. The HAR variant contradicts that interpretation because it sparsely acquires actual labels through active learning. A second misconception would be to assume that CSST implies a fixed architectural recipe. The two arXiv usages show that the label encompasses at least one module-level design and one end-to-end framework.

5. Empirical behavior across HAR and SSVEP

The HAR study evaluates on DSADS, PAMAP-2, and an in-house EMG dataset. DSADS contains 8 subjects, 12 daily/sports activities, and 100 Hz IMUs. PAMAP-2 contains 7 subjects, 5 activities, and 100 Hz IMUs. The EMG dataset contains 10 subjects, 5 locomotion classes plus 4 gait-phase classes, and 1,111 Hz EMG. Reported metrics are precision, recall, accuracy on the held-out subject, percent of target samples actually labeled, and total adaptation time (Wei et al., 2023).

The SSVEP study evaluates on Benchmark and BETA. Benchmark contains 35 subjects, 64-channel EEG, 40-class SSVEP at 8–15.8 Hz, and 6 blocks at 5 s. BETA contains 70 subjects, the same 40 classes, and 4 blocks at 2 s or 3 s. Preprocessing selects 9 occipital channels, uses latency pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.4 s, varies window length pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.5 from 0.4 to 1 s, and decomposes signals into pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.6 filter-bank sub-bands. The protocol is Leave-One-Subject-Out, with batch size 64, 500 epochs of PTAL and 500 epochs of DEST, Adam optimizer with learning rate pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.7 and weight decay pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.8, pseudo-label threshold pθ(y=kxi)=exp(fθ(xi)k)l=1Kexp(fθ(xi)l).p_\theta(y=k\mid x_i)=\frac{\exp\bigl(f_\theta(x_i)_k\bigr)}{\sum_{l=1}^K\exp\bigl(f_\theta(x_i)_l\bigr)}.9, EMA momentum conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).0, and contrastive hyperparameters conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).1 and conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).2. The reported metrics are accuracy and information transfer rate (ITR), with

conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).3

These settings establish that the two CSST lines are empirically evaluated under very different signal modalities and operational criteria (Wang et al., 29 Jan 2026).

Setting Reported result Interpretation
DSADS, fully supervised fine-tuning conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).4 accuracy, conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).5 labeled Upper bound
DSADS, ActiveSelfHAR (3 iters) conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).6, conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).7 labeled, conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).8 min Near upper bound
PAMAP-2, fine-tuning conf(xi)=max1kKpθ(y=kxi).\mathrm{conf}(x_i)=\max_{1\le k\le K} p_\theta(y=k\mid x_i).9, xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}00 labeled Reference
PAMAP-2, ActiveSelfHAR xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}01, xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}02 labeled, xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}03 min Slightly above fine-tuning
EMG locomotion/phase, fine-tuning xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}04, xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}05 labeled Reference
EMG locomotion/phase, ActiveSelfHAR xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}06, xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}07 labeled, xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}08 min Near reference
Benchmark, signal length xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}09 s CSST ITR xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}10 vs SFDA xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}11, xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}12 Higher ITR
BETA, signal length xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}13 s CSST ITR xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}14 vs SFDA xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}15, xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}16 Higher ITR

For HAR, the main empirical claim is that the method presents similar HAR accuracies to the upper bound, defined as fully supervised fine-tuning, with less than xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}17 labeled target data, and that it improves data efficiency and time cost. It also outperforms purely unsupervised UDA (MCD), pure self-training (SelfHAR), and pure active learning (AL-HAR) in the accuracy-label trade-off, while keeping total adaptation time on the order of 1–15 minutes (Wei et al., 2023).

For SSVEP, the principal comparative result is state-of-the-art performance across varying signal lengths on Benchmark and BETA. The ablation reported for Benchmark at 1 s is especially informative: baseline self-training yields xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}18, adding PTAL yields xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}19, adding DEST yields xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}20, adding FBEA yields xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}21, and adding FBEA+TFA-CL yields xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}22. This does not support a simplistic assumption that every additional component is individually monotonic in effect; rather, it indicates that the contribution of components is interaction-dependent (Wang et al., 29 Jan 2026).

6. Limitations, interpretive cautions, and prospective directions

The SSVEP framework states several limitations directly. It relies on sufficiently strong pseudo-labels, and extremely low target SNR may still degrade performance. It requires additional hyperparameter tuning for different hardware or new paradigms. Extending the approach to online continuous adaptation and real-time BCI deployment remains future work. The computational profile is also explicit: two-stage training with adversarial min-max optimization, teacher-student updates, and contrastive pairs adds approximately 20–30% overhead, although inference remains a single forward pass of xisRNC×NPx_i^s\in\mathbb{R}^{N_C\times N_P}23 (Wang et al., 29 Jan 2026).

The HAR study emphasizes a different operational point: the method is intended to enable user-independent HAR in smart healthcare systems and wireless body sensor networks by combining pseudo-label bootstrapping, sparse querying of ambiguous regions, and spatio-temporal propagation of true labels. Its reported adaptation times remain within minutes, which is central to the claim of practical data efficiency (Wei et al., 2023).

Taken together, these works indicate that CSST is not reducible to pseudo-label recycling alone. In one line of work, it is strengthened by active querying and structured neighbor propagation; in the other, by alignment, adversarial pre-training, dual-ensemble refinement, and contrastive learning. A plausible implication is that the viability of CSST depends less on the generic use of pseudo-labels than on the mechanisms used to control pseudo-label noise under cross-subject shift.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cross-Subject Self-Training (CSST).