Trustworthiness Mechanisms for Multimedia QA
Develop reliable trustworthiness mechanisms for multimedia question answering that provide explicit modality-level attribution and segment-level citations to the retrieved evidence supporting generated answers.
References
Despite recent progress, several challenges remain unresolved. Key issues include the difficulty of finegrained multimodal alignment (e.g., syncing spoken language with visual scenes), the lack of robust trustworthiness mechanisms such as modality attribution or segment-level citations, and the computational overhead introduced by real time or large scale retrieval. Further complexities arise in handling multilingual queries and supporting low-resource modalities, along with the persistent challenge of evaluating answer quality across modalities.