CSST: Cross-Subject Self-Training
- Cross-Subject Self-Training (CSST) is a self-adaptation strategy that uses pseudo-labels to bridge distribution gaps between source and target subjects.
- It integrates techniques like active learning, feature alignment, and contrastive learning to manage label scarcity and subject variability.
- Empirical results in HAR and SSVEP settings demonstrate that CSST achieves near upper-bound performance with minimal target-domain labelled data.
Searching arXiv for the cited CSST-related papers and close context. Cross-Subject Self-Training (CSST) denotes a class of cross-subject adaptation procedures in which a model trained on source subjects, or inherited from a previous iteration, generates pseudo-labels on a target subject and then uses those pseudo-labels to improve target-domain performance. In recent arXiv usage, the term appears in at least two distinct but related forms: as the self-training module inside ActiveSelfHAR for cross-subject human activity recognition (HAR), and as the name of a full cross-subject domain adaptation framework for steady-state visually evoked potential (SSVEP) classification. In both cases, CSST addresses subject-dependent distribution shift and the scarcity or cost of target-domain annotation, but the concrete mechanisms differ substantially (Wei et al., 2023, Wang et al., 29 Jan 2026).
1. Problem setting and terminological scope
The common problem addressed by CSST is cross-subject transfer: a model is trained using data from one set of subjects and then adapted to a new subject whose data distribution differs from the source distribution. In the HAR setting, this is described as the “cross-subject issue when adapting to new users,” which hinders real-world deployment despite strong laboratory performance (Wei et al., 2023). In the SSVEP setting, the source domain is defined as
with labeled trials and labels , while the target domain is
with unlabeled trials , under the assumption that . The learning objective is to minimize the target error
using labeled source data and unlabeled target data (Wang et al., 29 Jan 2026).
Within this scope, CSST should not be treated as a single canonical algorithm. In ActiveSelfHAR, it is a module interleaved with active learning and neighbor-based augmentation. In SSVEP classification, it is the central framework, comprising Pre-Training with Adversarial Learning (PTAL), Dual-Ensemble Self-Training (DEST), and Time-Frequency Augmented Contrastive Learning (TFA-CL). This suggests that “CSST” functions more as a methodological label for self-training-centered cross-subject adaptation than as a uniquely standardized architecture.
2. CSST in ActiveSelfHAR for cross-subject HAR
In ActiveSelfHAR, the CSST module begins from a teacher network with logits and class probabilities
The confidence score is defined as
Given a threshold 0, such as 1 for EMG data and 2 for IMU data, incremented by 3 each iteration, a target-domain sample is included in the self-training set if 4, with pseudo-label
5
At iteration 6, with unlabeled target pool 7 and model parameters 8, the self-training set is formed from those 9 whose maximum class probability under 0 exceeds the iteration-specific threshold 1 (Wei et al., 2023).
The method then extracts penultimate-layer feature vectors 2 and reduces them via PCA to 3D. For each class 3, it defines a class center 4 as the sample in the pseudo-labeled set whose feature is closest to the classwise mean feature among samples with pseudo-label 5. The collection of these centers forms the center set 6. The remaining unlabeled pool is
7
For each sample in 8, distances to the two nearest class centers are computed, and informativeness is scored using those distances. Samples are grouped according to which pair of centers they lie between. From each of the 9 boundary groups, the top-0 most informative points are selected for true-label querying; an example given is 1 per group.
A queried point 2 is then used to recruit spatio-temporal neighbors. The neighbor score is
3
where 4 is the window timestamp and 5 is a small time window, for example 6. All 7 with 8 inherit the label of 9. The resulting augmented core set 0 is the union of queried samples and these neighbors.
The full algorithm alternates pseudo-labeling, center computation, active querying, core-set augmentation, and fine-tuning. “Update” is defined as fine-tuning only the fully-connected layers while freezing shared CNN layers. The student model at iteration 1 is trained on 2 using the task loss
3
3. CSST in SSVEP classification: FBEA, PTAL, DEST, and TFA-CL
In the SSVEP formulation, CSST is a two-stage cross-subject domain adaptation framework built on self-training. It is preceded by Filter-Bank Euclidean Alignment (FBEA), which exploits SSVEP frequency information. Each trial is decomposed into 4 sub-bands, giving
5
After reshaping to 6, the covariance is
7
The reference covariance is
8
and alignment is performed by
9
The stated purpose is to reduce inter-subject distributional shift while preserving cross-band correlations (Wang et al., 29 Jan 2026).
The first stage, PTAL, uses a feature extractor 0, classifier 1, and domain discriminator 2, with a Gradient-Reversal Layer between 3 and 4. The supervised source loss is
5
where
6
The adversarial loss is
7
minimized with respect to 8 and maximized with respect to 9. The overall pre-training objective is
0
The second stage, DEST, instantiates two copies of 1: a student with parameters 2, updated by gradient descent, and a teacher with parameters 3, updated by exponential moving average,
4
For each target trial 5, three views are formed: the original 6 and two augmentations 7. A projection head 8 produces embeddings
9
Predicted one-hot labels are
0
They are fused with cosine-similarity weights
1
Only pseudo-labels whose top score exceeds confidence threshold 2 are retained. The target self-training loss is
3
TFA-CL augments each pseudo-labeled target sample along the temporal axis, using jitter and cropping, and along the frequency axis, using additive noise in sub-bands. For a batch of 4 augmented embeddings 5, with temperature 6, the set of positives for anchor 7 is
8
The supervised contrastive loss for anchor 9 is
0
The total self-training objective is
1
4. Shared principles and major divergences
Both instantiations of CSST are organized around the same central operation: pseudo-labeling of unlabeled target-domain data under a confidence criterion. In ActiveSelfHAR, pseudo-labels are produced by the model trained in the previous iteration or the source domain, and samples are admitted according to 2 (Wei et al., 2023). In the SSVEP framework, pseudo-labels are produced from three views and accepted only when the fused prediction exceeds 3 (Wang et al., 29 Jan 2026). In both cases, the method assumes that high-confidence predictions are sufficiently reliable to seed further target adaptation.
The principal divergence lies in how each method treats the unlabeled remainder and the role of supervision. ActiveSelfHAR is explicitly hybrid: it combines self-training with active learning, queries true labels for ambiguous target samples, and propagates these labels through spatio-temporal grouping. The SSVEP CSST framework is instead built around source-supervised pre-training, unsupervised target pseudo-labeling, teacher-student refinement, and contrastive regularization. It does not include a human-in-the-loop querying stage.
The feature-space machinery also differs. ActiveSelfHAR constructs class centers from pseudo-labeled target features, uses nearest and second-nearest centers to identify boundary regions, and enlarges queried sets through local spatio-temporal structure. The SSVEP framework performs covariance-based alignment before training, adversarial domain confusion during pre-training, multi-view pseudo-label fusion in DEST, and supervised contrastive learning on pseudo-labeled target embeddings. This suggests that CSST is best understood as a self-training core that can be embedded in substantially different adaptation pipelines.
A common misconception would be to interpret CSST as necessarily label-free. The HAR variant contradicts that interpretation because it sparsely acquires actual labels through active learning. A second misconception would be to assume that CSST implies a fixed architectural recipe. The two arXiv usages show that the label encompasses at least one module-level design and one end-to-end framework.
5. Empirical behavior across HAR and SSVEP
The HAR study evaluates on DSADS, PAMAP-2, and an in-house EMG dataset. DSADS contains 8 subjects, 12 daily/sports activities, and 100 Hz IMUs. PAMAP-2 contains 7 subjects, 5 activities, and 100 Hz IMUs. The EMG dataset contains 10 subjects, 5 locomotion classes plus 4 gait-phase classes, and 1,111 Hz EMG. Reported metrics are precision, recall, accuracy on the held-out subject, percent of target samples actually labeled, and total adaptation time (Wei et al., 2023).
The SSVEP study evaluates on Benchmark and BETA. Benchmark contains 35 subjects, 64-channel EEG, 40-class SSVEP at 8–15.8 Hz, and 6 blocks at 5 s. BETA contains 70 subjects, the same 40 classes, and 4 blocks at 2 s or 3 s. Preprocessing selects 9 occipital channels, uses latency 4 s, varies window length 5 from 0.4 to 1 s, and decomposes signals into 6 filter-bank sub-bands. The protocol is Leave-One-Subject-Out, with batch size 64, 500 epochs of PTAL and 500 epochs of DEST, Adam optimizer with learning rate 7 and weight decay 8, pseudo-label threshold 9, EMA momentum 0, and contrastive hyperparameters 1 and 2. The reported metrics are accuracy and information transfer rate (ITR), with
3
These settings establish that the two CSST lines are empirically evaluated under very different signal modalities and operational criteria (Wang et al., 29 Jan 2026).
| Setting | Reported result | Interpretation |
|---|---|---|
| DSADS, fully supervised fine-tuning | 4 accuracy, 5 labeled | Upper bound |
| DSADS, ActiveSelfHAR (3 iters) | 6, 7 labeled, 8 min | Near upper bound |
| PAMAP-2, fine-tuning | 9, 00 labeled | Reference |
| PAMAP-2, ActiveSelfHAR | 01, 02 labeled, 03 min | Slightly above fine-tuning |
| EMG locomotion/phase, fine-tuning | 04, 05 labeled | Reference |
| EMG locomotion/phase, ActiveSelfHAR | 06, 07 labeled, 08 min | Near reference |
| Benchmark, signal length 09 s | CSST ITR 10 vs SFDA 11, 12 | Higher ITR |
| BETA, signal length 13 s | CSST ITR 14 vs SFDA 15, 16 | Higher ITR |
For HAR, the main empirical claim is that the method presents similar HAR accuracies to the upper bound, defined as fully supervised fine-tuning, with less than 17 labeled target data, and that it improves data efficiency and time cost. It also outperforms purely unsupervised UDA (MCD), pure self-training (SelfHAR), and pure active learning (AL-HAR) in the accuracy-label trade-off, while keeping total adaptation time on the order of 1–15 minutes (Wei et al., 2023).
For SSVEP, the principal comparative result is state-of-the-art performance across varying signal lengths on Benchmark and BETA. The ablation reported for Benchmark at 1 s is especially informative: baseline self-training yields 18, adding PTAL yields 19, adding DEST yields 20, adding FBEA yields 21, and adding FBEA+TFA-CL yields 22. This does not support a simplistic assumption that every additional component is individually monotonic in effect; rather, it indicates that the contribution of components is interaction-dependent (Wang et al., 29 Jan 2026).
6. Limitations, interpretive cautions, and prospective directions
The SSVEP framework states several limitations directly. It relies on sufficiently strong pseudo-labels, and extremely low target SNR may still degrade performance. It requires additional hyperparameter tuning for different hardware or new paradigms. Extending the approach to online continuous adaptation and real-time BCI deployment remains future work. The computational profile is also explicit: two-stage training with adversarial min-max optimization, teacher-student updates, and contrastive pairs adds approximately 20–30% overhead, although inference remains a single forward pass of 23 (Wang et al., 29 Jan 2026).
The HAR study emphasizes a different operational point: the method is intended to enable user-independent HAR in smart healthcare systems and wireless body sensor networks by combining pseudo-label bootstrapping, sparse querying of ambiguous regions, and spatio-temporal propagation of true labels. Its reported adaptation times remain within minutes, which is central to the claim of practical data efficiency (Wei et al., 2023).
Taken together, these works indicate that CSST is not reducible to pseudo-label recycling alone. In one line of work, it is strengthened by active querying and structured neighbor propagation; in the other, by alignment, adversarial pre-training, dual-ensemble refinement, and contrastive learning. A plausible implication is that the viability of CSST depends less on the generic use of pseudo-labels than on the mechanisms used to control pseudo-label noise under cross-subject shift.