Adversarial Testing for Visual Grounding via Image-Aware Property Reduction (2403.01118v1)

Published 2 Mar 2024 in cs.CV and cs.AI

Abstract: Due to the advantages of fusing information from various modalities, multimodal learning is gaining increasing attention. Being a fundamental task of multimodal learning, Visual Grounding (VG), aims to locate objects in images through natural language expressions. Ensuring the quality of VG models presents significant challenges due to the complex nature of the task. In the black box scenario, existing adversarial testing techniques often fail to fully exploit the potential of both modalities of information. They typically apply perturbations based solely on either the image or text information, disregarding the crucial correlation between the two modalities, which would lead to failures in test oracles or an inability to effectively challenge VG models. To this end, we propose PEELING, a text perturbation approach via image-aware property reduction for adversarial testing of the VG model. The core idea is to reduce the property-related information in the original expression meanwhile ensuring the reduced expression can still uniquely describe the original object in the image. To achieve this, PEELING first conducts the object and properties extraction and recombination to generate candidate property reduction expressions. It then selects the satisfied expressions that accurately describe the original object while ensuring no other objects in the image fulfill the expression, through querying the image with a visual understanding technique. We evaluate PEELING on the state-of-the-art VG model, i.e. OFA-VG, involving three commonly used datasets. Results show that the adversarial tests generated by PEELING achieves 21.4% in MultiModal Impact score (MMI), and outperforms state-of-the-art baselines for images and texts by 8.2%--15.1%.

References (61)

Authors (7)

Zhiyuan Chang (8 papers)
Mingyang Li (86 papers)
Junjie Wang (164 papers)
Cheng Li (1094 papers)
Boyu Wu (8 papers)
Fanjiang Xu (16 papers)
Qing Wang (341 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Adversarial Testing for Visual Grounding via Image-Aware Property Reduction (2403.01118v1)

Summary

Related Papers