Open questions on generalizability to additional modalities beyond image, text, and audio
Ascertain the generalizability of results on training-free missing modality prediction from image, text, and audio to additional modalities, specifically video, tabular data, and sensor streams.
References
Second, our evaluation is limited to three modalities, leaving open questions about generalizability to others such as video, tabular data, or sensor streams—modalities of growing importance in real-world applications.
— How Far Are We from Predicting Missing Modalities with Foundation Models?
(2506.03530 - Ke et al., 4 Jun 2025) in Discussion — Limitations