Resolve the alignment–discriminability trade-off for weak modalities

Resolve the trade-off within the COMPASS framework between enforcing strong cross-modal alignment in the shared latent space and preserving modality-specific discriminability for inherently weak modalities such as RFID, so that single-modality performance (e.g., RFID-only human activity recognition accuracy) improves without degrading cross-modal transfer benefits.

Background

COMPASS enforces a fixed, slot-complete fusion interface by generating proxy tokens for missing modalities and aligning all modality representations in a shared latent space. While this improves robustness across missing-modality scenarios, the authors observe that stronger cross-modal alignment can reduce modality-specific discriminability for weak modalities.

In their evaluation, RFID-only performance remains relatively low, suggesting a tension between alignment objectives (e.g., VICReg-based shared-space regularization and proxy alignment) and the need to retain modality-unique cues for weak sensors. The authors explicitly acknowledge that this trade-off is not fully resolved, motivating methods that maintain strong alignment while safeguarding unimodal performance for weak modalities.

References

RFID-only accuracy (48.4%) remains low---stronger cross-modal alignment can erode modality-specific discriminability for inherently weak modalities, and we have not fully resolved this trade-off.

COMPASS: Complete Multimodal Fusion via Proxy Tokens and Shared Spaces for Ubiquitous Sensing  (2604.02056 - Wang et al., 2 Apr 2026) in Limitations paragraph, Section: Conclusion