Physiological Encoders in Biosignal Processing

Updated 24 December 2025

Physiological encoders are computational frameworks that convert complex biosignals into lower-dimensional, multiplexed representations for analysis and control.
They integrate diverse architectures from biophysical transduction, statistical feature pipelines, to deep multimodal networks to robustly extract key signal features.
Recent advances, including self-supervised pretraining and neuromorphic designs, have improved scalability, generalization, and energy efficiency for real-time applications.

Physiological encoders are computational models and circuit mechanisms for transforming physiological or neurophysiological signals into lower-dimensional, informative, and multiplexed representations suitable for downstream analysis, inference, or control. They span biophysical transduction architectures (e.g., retinal processing), statistical feature extractors, supervised/unsupervised neural networks, and multimodal foundation models. This article critically examines foundational mechanisms, encoder architectures, and representative physiological domains, culminating in modern approaches to generalizable and robust physiological representation learning.

1. Biological Foundations of Physiological Encoding

The canonical example of a physiological encoder is the vertebrate retina, which transforms incident luminance distributions into parallel streams of spiking output. The retina implements a dual-channel system:

Bright ("ON") channels: ON-bipolar cells invert the photoreceptor signal, depolarizing in response to light increments and projecting to ON-type ganglion cells.
Dark ("OFF") channels: OFF-bipolar cells preserve polarity for light decrements and similarly project to OFF ganglion cells.

This architecture encodes both uniform luminance ( $r_+(L)$ , $r_-(L)$ ) and local contrast via center-surround “AND-NOT”–type synapses: $Z = \max[X – Y, 0]$ , with $X$ the center and $Y$ the surround input. The firing rates of the bright and dark channels scale in push–pull log-linear fashion over seven orders of magnitude, supporting both absolute intensity and differential contrast discrimination. These findings motivate dual-channel neuromorphic encoders that achieve log-linear, high dynamic-range pixelwise encoding with differential lateral processing (Greene, 26 Dec 2024).

In the proprioceptive domain, muscle spindles transduce muscle length ( $L$ ) and velocity ( $\dot{L}$ ) into primary afferent signaling via canonical models such as $f(L,\dot L)=k_1(L-L_0)+k_2\dot L+f_0$ . At higher processing stages, cortical populations express a continuum of tuning curves (linear/intensity, sigmoidal, Gaussian) that collectively encode spatial postures and joint angle configurations (Hoffmann et al., 2016).

2. Signal Processing Architectures and Feature Extraction

Traditionally, physiological encoders for emotion or task-state inference use staged feature pipelines:

Preprocessing: Bandpass/lowpass filtering (Butterworth/IIR), smoothing (e.g., envelope/median), and normalization per-modality (e.g., min–max or $z$ -scoring) (Perez-Rosero et al., 2016, Hutcheson et al., 8 Oct 2025).
Feature extraction: Domain-agnostic moments (mean, variance, kurtosis), extrema, spectral power (Welch/periodogram), entropy, and peak counting are typical per-channel features.
Redundancy reduction: Feature selection frequently employs correlation analysis, removing features with $|r_{ij}|>0.8$ to avoid overfitting and minimize computational complexity.

Such pipelines enable compact, information-rich encoding suitable for low-power or wearable HCI but lack end-to-end task-optimality and adaptability to new modalities or unlabeled settings (Perez-Rosero et al., 2016).

3. Deep and Foundation Model Encoders

Recent advances deploy deep architectures—autoencoders, transformer models, and neural codecs—to learn robust, reusable representations from large-scale physiological data:

Criss-Cross and Patchwise Transformers: Architectures such as CBraMod employ multi-head transformers with alternately temporal and channelwise self-attention on modality-specific patches (e.g., ECG/EEG) (Ghallab et al., 17 Dec 2025). Self-supervised pretraining with dual-masking (temporal patch and lead/channel masking) yields encoders with generalization across downstream tasks and substantial label efficiency.
Foundation Models for Multimodal Input: PhysioOmni introduces decoupled patchwise tokenizers for each modality (e.g., EEG, ECG, EOG, EMG), generating discrete codebooks which distinguish modality-specific (“heterogeneous”) versus modality-invariant (“homogeneous”) representations (Jiang et al., 28 Apr 2025). Masked token prediction (private/invariant) and resilient fine-tuning with prototype alignment permit robust inference even under arbitrary missing modalities, with performance surpassing prior fusion pipelines and unimodal models.
Latent Sensor Fusion and Vector-Quantized Autoencoders: Modality-agnostic pipelines map all biosignals to fixed-size spectral “images” and encode via a common frozen VQ-VAE, ensuring uniformity, speed, and compressive efficiency suitable for deployment on edge hardware (Ahmed et al., 13 Jul 2025). Downstream fusion is achieved via concatenation or simple arithmetic projection in the latent space.
Discrete Neural Codecs: BioCodec demonstrates VQ-based bottlenecking and neural codebook learning for EEG/EMG, with six residual codebooks, LSTM temporal encoding, and dual-layer transformers for downstream decoding and classification, achieving high accuracy and broad codebook usage for diverse biosignals even in low-resource settings (Avramidis et al., 10 Oct 2025).

4. Multimodal and Multitask Integration

Multimodal encoders (PHemoNet, MuMTAffect, PhysioOmni) systematically process heterogeneous physiological inputs with architecture innovations:

Hypercomplex Networks (PHM): PHemoNet defines both encoders and fusion blocks in parameterized hypercomplex algebra (quaternionic, $n=4$ , etc.), tailoring weight-sharing and intra-modality channel coupling to the “natural” channel dimension (Lopez et al., 13 Sep 2024). This yields state-of-the-art emotion recognition on valence/arousal scales using EEG, ECG, GSR, and eye data, with parameter efficiency and robust cross-modal interaction.
Transformer Fusion: MuMTAffect utilizes shallow (depth=1) modality-specific transformers (eye-gaze, pupil, AU, GSR), with features downsampled and fused via a secondary transformer, then routed by task-specific attention to emotion and personality heads. Multitask learning (e.g., Big-5 trait regression as auxiliary) regularizes affective embeddings, improving emotion prediction and robustness to missing input (Seikavandi et al., 4 Sep 2025).
Prototype Alignment and Modality Dropout: Methods such as PhysioOmni employ prototype alignment at the class level, so each modality-specific encoder can independently cluster features around shared class prototypes, enabling graceful degradation and high performance when modalities are missing at test time (Jiang et al., 28 Apr 2025).

Encoder Model	Fusion Method	Supported Modalities	Missing Modality Robustness
PHemoNet (Lopez et al., 13 Sep 2024)	Hypercomplex PHM	EEG, GSR, ECG, Eye	No (requires all available)
MuMTAffect (Seikavandi et al., 4 Sep 2025)	Transformer Fusion	Eye, Pupil, AU, GSR	Partially, via modularity
PhysioOmni (Jiang et al., 28 Apr 2025)	Tokenizer + Fuser	EEG, ECG, EOG, EMG	Yes (Arbitrary subsets)

5. Self-Supervised and Disentanglement Strategies

Modern encoders leverage self-supervised and adversarially regularized learning to capture latent physiological factors, enhance cross-subject generalization, and avoid overfitting:

Soft-Disentangled Rateless Autoencoders: These introduce stochastic dropout schedules in the latent space, allowing asynchronous, dimension-by-dimension adversarial/nuisance partitioning (Han et al., 2020). Rateless models, where the latent bottleneck size is not fixed, create a continuum of subencoders with user-invariant representations, improving transfer across unknown users/tasks up to 11.6% absolute accuracy in cross-subject evaluation.
Time-Series Momentum Contrast (TS-MoCo): Contrastive pretraining with student–teacher models aligning augmented and clean sequences, combined with future prediction (GRU), encourages encoders to capture both local and global time-dependent structure (Hallgarten et al., 2023).
Cross-verified Feature Disentangling (CVD): Autoencoder architectures imposing reconstruction, code-swap invariance, and predictive objectives (e.g., accurate rPPG and HR regression) with explicit physiological vs. nuisance code partitioning reliably separate true physiological information from artifacts/noise (Niu et al., 2020).

6. Neuromorphic and Latency-Optimized Encoders

There is a resurgence in biologically inspired, hardware-efficient physiological encoders that exploit intrinsic analog variability and spiking-time codes:

Dual-Channel Retinal Models: Push–pull mechanisms as detailed above serve as templates for robust, log-linear dynamic range event cameras and silicon retinas (Greene, 26 Dec 2024).
Sparse Spiking Codecs: Feedforward networks of exp-LIF neurons with intrinsic parameter heterogeneity employ single-spike, population-median-referenced codes. The time-to-first-spike vector, linearly decoded, reconstructs continuous input parameters with high temporal precision and low energy, and is robust to spike dropout and jitter (Costa et al., 23 Jan 2025).
Real-Time 3D Manifold Encoders: Multi-layer MLP autoencoders with latent orthogonality constraints (SVD-based angular regularization) map 24-channel psychophysiologic time series to dynamic 3D coordinates, supporting real-time cognitive state tracking with $>$ 85% classification accuracy and stable cross-user manifold alignment (Hutcheson et al., 8 Oct 2025).

7. Impact, Limitations, and Future Directions

Physiological encoders have evolved from hand-crafted feature pipelines and classical linear models to unified foundation architectures and neuromorphic circuits. Advances in multimodal self-supervised pretraining (PhysioOmni, CBraMod), hybrid algebraic encoders (PHemoNet), and resource-constrained pipelines (Latent Sensor Fusion) have sharply increased scalability, universality, and real-world deployment potential (Lopez et al., 13 Sep 2024, Jiang et al., 28 Apr 2025, Ahmed et al., 13 Jul 2025). However, challenges persist: effective temporal–spatial representation fusion, resolving the biological locus of cross-modality mapping, and architectural adaptation for emerging modalities. There is broad consensus that further progress in universal, label-efficient, and explainable physiological encoders will come from advancing codebook/tokenizer designs, dynamically adaptive architectures, and rigorous grounding in both biological computation and cross-domain pretraining (Avramidis et al., 10 Oct 2025, Han et al., 2020, Ghallab et al., 17 Dec 2025).

References

(Greene, 26 Dec 2024) Greene, "Neuromorphic Dual-channel Encoding of Luminance and Contrast"
(Perez-Rosero et al., 2016) Chanel et al., "Decoding Emotional Experience through Physiological Signal Processing"
(Hoffmann et al., 2016) Hoffmann et al., "The encoding of proprioceptive inputs in the brain: knowns and unknowns from a robotic perspective"
(Ghallab et al., 17 Dec 2025) Xu et al., "Leveraging Foundational Models and Simple Fusion for Multi-modal Physiological Signal Analysis"
(Jiang et al., 28 Apr 2025) Wang et al., "Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities"
(Lopez et al., 13 Sep 2024) Biagiotti et al., "PHemoNet: A Multimodal Network for Physiological Signals"
(Seikavandi et al., 4 Sep 2025) Duan et al., "MuMTAffect: A Multimodal Multitask Affective Framework for Personality and Emotion Recognition from Physiological Signals"
(Ahmed et al., 13 Jul 2025) Bahrami et al., "Latent Sensor Fusion: Multimedia Learning of Physiological Signals for Resource-Constrained Devices"
(Hutcheson et al., 8 Oct 2025) Hutcheson & Raj, "Autoencoding Coordinate Sequences from Psychophysiologic Signals"
(Costa et al., 23 Jan 2025) Costa & De Luca, "Continuous signal sparse encoding using analog neuromorphic variability"
(Avramidis et al., 10 Oct 2025) Neuhaus et al., "Neural Codecs as Biosignal Tokenizers"
(Hallgarten et al., 2023) Delaney et al., "TS-MoCo: Time-Series Momentum Contrast for Self-Supervised Physiological Representation Learning"
(Han et al., 2020) Hsu et al., "Universal Physiological Representation Learning with Soft-Disentangled Rateless Autoencoders"
(Niu et al., 2020) Yu et al., "Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling"
(Williams et al., 2021) Williams & Wehbe, "Behavior measures are predicted by how information is encoded in an individual's brain"