Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 86 tok/s
Gemini 2.5 Pro 60 tok/s Pro
GPT-5 Medium 28 tok/s
GPT-5 High 34 tok/s Pro
GPT-4o 72 tok/s
GPT OSS 120B 441 tok/s Pro
Kimi K2 200 tok/s Pro
2000 character limit reached

Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval (2407.15376v1)

Published 22 Jul 2024 in cs.MM

Abstract: Existing methods of 3D cross-modal retrieval heavily lean on category distribution priors within the training set, which diminishes their efficacy when tasked with unseen categories under open-set environments. To tackle this problem, we propose the Structure-Aware Residual-Center Representation (SRCR) framework for self-supervised open-set 3D cross-modal retrieval. To address the center deviation due to category distribution differences, we utilize the Residual-Center Embedding (RCE) for each object by nested auto-encoders, rather than directly mapping them to the modality or category centers. Besides, we perform the Hierarchical Structure Learning (HSL) approach to leverage the high-order correlations among objects for generalization, by constructing a heterogeneous hypergraph structure based on hierarchical inter-modality, intra-object, and implicit-category correlations. Extensive experiments and ablation studies on four benchmarks demonstrate the superiority of our proposed framework compared to state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. “Cross-modal Center Loss for 3D Cross-Modal Retrieval,” in CVPR, 2021, pp. 3142–3151.
  2. “Adversarial Cross-modal Retrieval,” in ACMMM, 2017, pp. 154–162.
  3. “RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval,” in CVPR, 2023, pp. 11610–11619.
  4. “Deep Canonical Correlation Analysis,” in ICML. PMLR, 2013, pp. 1247–1255.
  5. “On Deep Multi-View Representation Learning,” in ICML. PMLR, 2015, pp. 1083–1092.
  6. “Open-Set Recognition: A Good Closed-Set Classifier is All You Need?,” arXiv preprint arXiv:2110.06207, 2021.
  7. “Adversarial Reciprocal Points Learning for Open Set Recognition,” TPAMI, vol. 44, no. 11, pp. 8065–8081, 2021.
  8. “Hypergraph-based Multi-Modal Representation for Open-Set 3D Object Retrieval,” TPAMI, , no. 01, pp. 1–18, 2023.
  9. “Cross-Modal Retrieval with Correspondence AutoEncoder,” in ACMMM, 2014, pp. 7–16.
  10. “HGNN+: General Hypergraph Neural Networks,” TPAMI, vol. 45, no. 3, pp. 3181–3199, 2022.
  11. “ABO: Dataset and Benchmarks for Real-World 3D Object Understanding,” in CVPR, 2022, pp. 21126–21136.
  12. “On Visual Similarity based 3D Model Retrieval,” in Computer graphics forum. Wiley Online Library, 2003, pp. 223–232.
  13. “Developing an Engineering Shape Benchmark for CAD Models,” Computer-Aided Design, vol. 38, no. 9, pp. 939–953, 2006.
  14. “3D Shapenets: A Deep Representation for Volumetric Shapes,” in CVPR, 2015, pp. 1912–1920.
  15. “Scalable Deep Multimodal Learning for Cross-Modal Retrieval,” in SIGIR, 2019, pp. 635–644.
  16. “Multi-Modal Semantic AutoEncoder for Cross-Modal Retrieval,” Neurocomputing, vol. 331, pp. 165–175, 2019.
  17. “Learning Placeholders for Open-Set Recognition,” in CVPR, 2021, pp. 4401–4410.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.