Diagnosing Human-object Interaction Detectors
Abstract: We have witnessed significant progress in human-object interaction (HOI) detection. The reliance on mAP (mean Average Precision) scores as a summary metric, however, does not provide sufficient insight into the nuances of model performance (e.g., why one model is better than another), which can hinder further innovation in this field. To address this issue, in this paper, we introduce a diagnosis toolbox to provide detailed quantitative break-down analysis of HOI detection models, inspired by the success of object detection diagnosis toolboxes. We first conduct holistic investigations in the pipeline of HOI detection. By defining a set of errors and the oracles to fix each of them, we can have a quantitative analysis of the significance of different errors according to the mAP improvement obtained from fixing each error. We then delve into two sub-tasks of HOI detection: human-object pair detection and interaction classification, respectively. For the first detection task, we compute the coverage of ground-truth human-object pairs as well as the noisiness level in the detection results. For the second classification task, we measure a model's performance of differentiating positive and negative detection results and also classifying the actual interactions when the human-object pairs are correctly detected. We analyze eight state-of-the-art HOI detection models and provide valuable diagnosis insights to foster future research. For instance, our diagnosis shows that state-of-the-art model RLIPv2 outperforms others mainly because it significantly improves the multi-label interaction classification accuracy. Our toolbox is applicable for different methods across different datasets and available at https://github.com/neu-vi/Diag-HOI.
- Bottom-up and top-down attention for image captioning and visual question answering. In CVPR, 2018.
- Convolutional image captioning. In CVPR, 2018.
- Vqa: Visual question answering. In ICCV, 2015.
- TIDE: A general toolbox for identifying object detection errors. In ECCV, 2020.
- Smooth-ap: Smoothing the path towards large-scale image retrieval. In ECCV, 2020.
- End-to-end object detection with transformers. In ECCV, 2020.
- HICO: A benchmark for recognizing human-object interactions in images. In ICCV, 2015.
- Learning to detect human-object interactions. In WACV, 2018.
- QAHOI: Query-based anchors for human-object interaction detection. arXiv preprint arXiv:2112.08647, 2021.
- Diagnosing errors in video relation detectors. In BMVC, 2021.
- Unsupervised image captioning. In CVPR, 2019.
- DRG: Dual relation graph for human-object interaction detection. In ECCV, 2020.
- Visual semantic role labeling. arXiv preprint arXiv:1505.04474, 2015.
- No-frills human-object interaction detection: Factorization, layout encodings, and training techniques. In ICCV, 2019.
- Diagnosing error in object detectors. In ECCV, 2012.
- Detecting human-object interaction via fabricated compositional learning. In CVPR, 2021.
- Bongard-hoi: Benchmarking few-shot visual reasoning for human-object interactions. In CVPR, 2022.
- Diagnosing rarity in human-object interaction detection. In CVPR, 2020.
- Relational context learning for human-object interaction detection. In CVPR, 2023.
- Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV, 123:32–73, 2017.
- Entangled transformer for image captioning. In ICCV, 2019.
- Discovering a variety of objects in spatio-temporal human-object interactions. arXiv preprint arXiv:2211.07501, 2022.
- Gen-vlkt: Simplify association and enhance interaction understanding for hoi detection. In CVPR, 2022.
- Microsoft coco: Common objects in context. In ECCV, 2014.
- Highlighting object category immunity for the generalization of human-object interaction detection. In AAAI, 2022a.
- Interactiveness field in human-object interactions. In CVPR, 2022b.
- Hierarchical question-image co-attention for visual question answering. NeurIPS, 2016.
- Fgahoi: Fine-grained anchors for human-object interaction detection. arXiv preprint arXiv:2301.04019, 2023.
- Solar: second-order loss and attention for image retrieval. In ECCV, 2020.
- Fine-tuning cnn image retrieval with no human annotation. IEEE TPAMI, 2018.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- Objects365: A large-scale, high-quality dataset for object detection. In ICCV, 2019.
- Where to look: Focus regions for visual question answering. In CVPR, 2016.
- Qpic: Query-based pairwise human-object interaction detection with image-wide contextual information. In CVPR, 2021.
- Detect-to-retrieve: Efficient regional aggregation for image search. In CVPR, 2019.
- VSGNet: Spatial attention network for detecting human object interactions using graph convolutions. In CVPR, 2020.
- Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE TPAMI, 2016.
- Fvqa: Fact-based visual question answering. IEEE TPAMI, 2017.
- Mining cross-person cues for body-part interactiveness learning in hoi detection. In ECCV, 2022.
- Fine-grained affordance annotation for egocentric hand-object interaction videos. In WACV, 2023.
- Rlip: Relational language-image pre-training for human-object interaction detection. In NeurIPS, 2022.
- Rlipv2: Fast scaling of relational language-image pre-training. In ICCV, 2023.
- Mining the benefits of two-stage and one-stage hoi detection. In NeurIPS, 2021a.
- Spatially conditioned graphs for detecting human-object interactions. In ICCV, 2021b.
- Efficient two-stage detection of human-object interactions with a novel unary-pairwise transformer. In CVPR, 2022a.
- Exploring structure-aware transformer over interaction proposals for human-object interaction detection. In CVPR, 2022b.
- Towards hard-positive query mining for detr-based human-object interaction detection. In ECCV, 2022.
- Relation parsing neural network for human-object interaction detection. In ICCV, 2019.
- End-to-end human object interaction detection with hoi transformer. In CVPR, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.