- The paper introduces BioMoTouch, integrating touchscreen and IMU data with deep learning for robust behavioral authentication.
- It leverages multimodal fusion with TinyViT and one-class classifiers to achieve 99.71% BAC and a 0.27% EER, outperforming conventional biometrics.
- Experiments demonstrate resilience against mimicry, artificial replication, and puppet attacks with consistent accuracy over five weeks.
BioMoTouch: Multi-modal Touch Authentication via Biometric-Motion Interaction Modeling
Motivation and Problem Statement
BioMoTouch addresses critical limitations of conventional biometric authentication on mobile devices, specifically the vulnerabilities inherent in static biometrics (fingerprint and facial features) and the challenges posed by advanced adversarial attacks, such as artificial replication and puppet attacks. The central claim of this work is that touch interaction is an intrinsically multi-dimensional signal: capacitive touchscreens capture physiological traits arising from finger morphology, while inertial sensors record behavioral dynamics. The integration and explicit modeling of these modalities are hypothesized to yield robust authentication resistant to partial-factor manipulation.
System Architecture and Methodology
BioMoTouch operates by implicitly collecting capacitive touchscreen and inertial measurement unit (IMU) data during natural device usage, with no additional hardware requirements, facilitating deployment on commodity hardware. The workflow consists of data acquisition, preprocessing, feature extraction, multimodal fusion, and user-specific one-class classification.
Figure 1: The workflow of BioMoTouch illustrating the modalities, preprocessing, and feature fusion pipelines.
Data Collection and Preprocessing
Experimental data were obtained from 38 participants, with capacitive images sampled at 20 fps and IMU signals at 200 Hz. The preprocessing pipeline includes adaptive touch detection via median/MAD thresholding, spatial region tracking, and temporal smoothing for capacitive frames, and wavelet denoising, quaternion-based orientation estimation, and STFT-based spectral feature extraction for IMU data. Cross-modal temporal alignment ensures synchronized pairing of physiological and motion features.
Figure 2: Illustration of the data collection process, depicting user-device interaction and sensor streams.
Feature Engineering and Multimodal Representation
Feature extraction leverages time-frequency analysis (STFT) for IMU signals—incorporating accelerometry and quaternion-derived roll/pitch/yaw—and applies temporal warping and amplitude-adaptive noise augmentation to capacitive data, simulating natural interaction variability. Both modalities utilize TinyViT backbone architectures for deep embedding extraction. The fusion network, a two-layer MLP with LeakyReLU and dropout, produces a 320-dimensional representation emphasizing coordinated physiological-behavioral coupling.



Figure 3: Two samples of User A, visualizing IMU STFT spectra across axes and angles, showcasing intra-user spectral consistency.
Figure 4: Visualized feature space of raw and augmented touch interaction data under PCA, delineating compact and well-separated user clusters.
Authentication Protocol and Attack Modeling
BioMoTouch frames authentication as a one-class classification task, utilizing OC-SVM, LOF, and IF as legitimate user profilers. The threat model incorporates mimicry attack (behavior imitation), artificial replication (fabrication of biometric traits), and puppet attack (forced use of genuine biometrics). Empirical evaluation confirms low EER and FAR across all attack scenarios, even against challenges that bypass liveness detection.
Figure 5: Fabrication procedure of fingerprint spoofs, detailing the physical replication protocol for adversarial testing.

Figure 6: Genuine image, illustrating ground-truth comparison in spoof resistance evaluation.
Experimental Results and Numerical Highlights
The main authentication dataset generated BAC of 99.71% and EER of 0.27% (TinyViT + OC-SVM configuration). Modality ablation revealed that while capacitive-based features encode user-specific physiological signatures (EER = 1.00%), optimal robustness is only achieved with multimodal fusion (EER = 0.27%). BioMoTouch maintained FAR below 0.90% across artificial replication, mimicry, and puppet attacks, outperforming commercial fingerprint sensors (Live20R: FAR up to 100% under puppet).


Figure 7: ROC curves of the IMU-based method, showing limited discriminative power in isolation.

Figure 8: Decision score distributions of the IMU-based method, exhibiting overlap between genuine and impostor classes.
Reliability and Deployment Robustness
Longitudinal trials demonstrated temporal stability, with EERs consistently below 1.08% across five weeks. BioMoTouch remained effective with different fingers (EER range 0.40%-0.95%), user postures (EER <1.48% even when walking), finger moisture (EER = 1.24% wet), and screen protectors (EER ≤ 0.58% for all types).
Figure 9: Long-term EER comparison over five weeks, showcasing temporal robustness versus single-modality baselines.
Figure 10: EERs of different one-class classifiers across fingers, revealing biometric consistency across thumb, index, and middle fingers.

Figure 11: EERs under different user postures, affirming resilience to physical and behavioral context shifts.
Implications and Future Perspectives
The explicit modeling of physiological-behavioral coupling opens new avenues for seamless, unobtrusive, and hardware-free mobile authentication. Results challenge the premise that commodity capacitive screens lack fine-grained physiological discriminability. On the theoretical side, BioMoTouch proposes a new paradigm in adversarial robustness—independence assumptions between biometric modalities are suboptimal, and coordinated fusion produces measurable security gains. From a practical standpoint, the framework is amenable to integration as an auxiliary behavioral biometric, enhancing security for PIN and fingerprint unlocking workflows. Future directions include domain adaptation for cross-device generalization, continuous authentication during active sessions, and expansion of multimodal fusion to additional sensor types.
Conclusion
BioMoTouch delivers a multi-modal behavioral authentication framework that achieves high accuracy, resilience against advanced spoofing vectors, and robust operation across diverse environmental conditions. The strong numerical results validate the core hypothesis of coordinated biometric-motion modeling and its practical utility in strengthening mobile security protocols. The broader implication is a shift toward implicit, liveness-independent, and multimodal behavioral biometrics for both authentication and continuous security monitoring on commodity devices.