Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval (2407.15376v1)
Abstract: Existing methods of 3D cross-modal retrieval heavily lean on category distribution priors within the training set, which diminishes their efficacy when tasked with unseen categories under open-set environments. To tackle this problem, we propose the Structure-Aware Residual-Center Representation (SRCR) framework for self-supervised open-set 3D cross-modal retrieval. To address the center deviation due to category distribution differences, we utilize the Residual-Center Embedding (RCE) for each object by nested auto-encoders, rather than directly mapping them to the modality or category centers. Besides, we perform the Hierarchical Structure Learning (HSL) approach to leverage the high-order correlations among objects for generalization, by constructing a heterogeneous hypergraph structure based on hierarchical inter-modality, intra-object, and implicit-category correlations. Extensive experiments and ablation studies on four benchmarks demonstrate the superiority of our proposed framework compared to state-of-the-art methods.
- “Cross-modal Center Loss for 3D Cross-Modal Retrieval,” in CVPR, 2021, pp. 3142–3151.
- “Adversarial Cross-modal Retrieval,” in ACMMM, 2017, pp. 154–162.
- “RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval,” in CVPR, 2023, pp. 11610–11619.
- “Deep Canonical Correlation Analysis,” in ICML. PMLR, 2013, pp. 1247–1255.
- “On Deep Multi-View Representation Learning,” in ICML. PMLR, 2015, pp. 1083–1092.
- “Open-Set Recognition: A Good Closed-Set Classifier is All You Need?,” arXiv preprint arXiv:2110.06207, 2021.
- “Adversarial Reciprocal Points Learning for Open Set Recognition,” TPAMI, vol. 44, no. 11, pp. 8065–8081, 2021.
- “Hypergraph-based Multi-Modal Representation for Open-Set 3D Object Retrieval,” TPAMI, , no. 01, pp. 1–18, 2023.
- “Cross-Modal Retrieval with Correspondence AutoEncoder,” in ACMMM, 2014, pp. 7–16.
- “HGNN+: General Hypergraph Neural Networks,” TPAMI, vol. 45, no. 3, pp. 3181–3199, 2022.
- “ABO: Dataset and Benchmarks for Real-World 3D Object Understanding,” in CVPR, 2022, pp. 21126–21136.
- “On Visual Similarity based 3D Model Retrieval,” in Computer graphics forum. Wiley Online Library, 2003, pp. 223–232.
- “Developing an Engineering Shape Benchmark for CAD Models,” Computer-Aided Design, vol. 38, no. 9, pp. 939–953, 2006.
- “3D Shapenets: A Deep Representation for Volumetric Shapes,” in CVPR, 2015, pp. 1912–1920.
- “Scalable Deep Multimodal Learning for Cross-Modal Retrieval,” in SIGIR, 2019, pp. 635–644.
- “Multi-Modal Semantic AutoEncoder for Cross-Modal Retrieval,” Neurocomputing, vol. 331, pp. 165–175, 2019.
- “Learning Placeholders for Open-Set Recognition,” in CVPR, 2021, pp. 4401–4410.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.