Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 79 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 199 tok/s Pro
GPT OSS 120B 444 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Clustering-based Image-Text Graph Matching for Domain Generalization (2310.02692v3)

Published 4 Oct 2023 in cs.CV and cs.AI

Abstract: Learning domain-invariant visual representations is important to train a model that can generalize well to unseen target task domains. Recent works demonstrate that text descriptions contain high-level class-discriminative information and such auxiliary semantic cues can be used as effective pivot embedding for domain generalization problems. However, they use pivot embedding in a global manner (i.e., aligning an image embedding with sentence-level text embedding), which does not fully utilize the semantic cues of given text description. In this work, we advocate for the use of local alignment between image regions and corresponding textual descriptions to get domain-invariant features. To this end, we first represent image and text inputs as graphs. We then cluster nodes within these graphs and match the graph-based image node features to the nodes of textual graphs. This matching process is conducted both globally and locally, tightly aligning visual and textual semantic sub-structures. We experiment with large-scale public datasets, such as CUB-DG and DomainBed, and our model achieves matched or better state-of-the-art performance on these datasets. The code is available at: https://github.com/noparkee/Graph-Clustering-based-DG

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  2. Decaug: Out-of-distribution generalization via decomposed feature representation and semantic augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  6705–6713, 2021.
  3. Relational inductive biases, deep learning, and graph networks. CoRR, abs/1806.01261, 2018. URL http://arxiv.org/abs/1806.01261.
  4. Recognition in terra incognita. In Proceedings of the European conference on computer vision (ECCV), pp.  456–473, 2018.
  5. Node classification in social networks. CoRR, abs/1101.3291, 2011. URL http://arxiv.org/abs/1101.3291.
  6. Domain generalization by marginal transfer learning. The Journal of Machine Learning Research, 22(1):46–100, 2021.
  7. Exploiting domain-specific features to enhance domain generalization. Advances in Neural Information Processing Systems, 34:21189–21201, 2021.
  8. Monet: Unsupervised scene decomposition and representation. CoRR, abs/1901.11390, 2019. URL http://arxiv.org/abs/1901.11390.
  9. End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp.  213–229. Springer, 2020.
  10. Swad: Domain generalization by seeking flat minima. Advances in Neural Information Processing Systems, 34:22405–22418, 2021.
  11. Domain generalization by mutual-information regularization with pre-trained models. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIII, pp.  440–457. Springer, 2022.
  12. Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
  13. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems, 29, 2016.
  14. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  15. Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp.  1657–1664, 2013.
  16. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
  17. Neural message passing for quantum chemistry. CoRR, abs/1704.01212, 2017. URL http://arxiv.org/abs/1704.01212.
  18. In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020.
  19. Vision gnn: An image is worth graph of nodes. arXiv preprint arXiv:2206.00272, 2022.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, 2016a.
  21. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016b.
  22. Self-challenging improves cross-domain generalization. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp.  124–140. Springer, 2020.
  23. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456. pmlr, 2015.
  24. Style neophile: Constantly seeking novel styles for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7130–7140, 2022.
  25. Selfreg: Self-supervised contrastive regularization for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  9619–9628, 2021.
  26. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  27. Out-of-distribution generalization via risk extrapolation (rex). arXiv preprint arXiv:2003.00688, 2020.
  28. Cross-domain ensemble distillation for domain generalization. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXV, pp.  1–20. Springer, 2022.
  29. Domainnet: Homograph detection for data lake disambiguation. arXiv preprint arXiv:2103.09940, 2021.
  30. Deeper, broader and artier domain generalization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.
  31. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018a.
  32. Domain generalization with adversarial feature learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  5400–5409, 2018b.
  33. Deep domain generalization via conditional invariant adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  624–639, 2018c.
  34. Graph structured network for image-text matching. CoRR, abs/2004.00277, 2020. URL https://arxiv.org/abs/2004.00277.
  35. Grounding visual representations with texts for domain generalization. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII, pp.  37–53. Springer, 2022.
  36. Hyeonseob Nam et al. Reducing domain gap by reducing style bias. In CVPR, 2021.
  37. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp.  1406–1415, 2019.
  38. Gradient starvation: A learning proclivity in neural networks. arXiv preprint arXiv:2011.09468, 2020.
  39. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
  40. Out-of-distribution failure through the lens of labeling mechanisms: An information theoretic approach. In ICML 2022: Workshop on Spurious Correlations, Invariance and Stability.
  41. Gradient matching for domain generalization. arXiv preprint arXiv:2104.09937, 2021.
  42. Certifying some distributional robustness with principled adversarial training. ICLR, 2017.
  43. Deep coral: Correlation alignment for deep domain adaptation. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  443–450. Springer, 2016.
  44. Graph clustering with graph neural networks. arXiv preprint arXiv:2006.16904, 2020.
  45. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  46. Vladimir N Vapnik. An overview of statistical learning theory. IEEE transactions on neural networks, 10(5):988–999, 1999.
  47. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
  48. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  5018–5027, 2017.
  49. Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010.
  50. Adversarial domain adaptation with domain mixup. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.  6502–6509, 2020.
  51. A fourier-based framework for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14383–14392, 2021.
  52. Improve unsupervised domain adaptation with mixup training. arXiv preprint arXiv:2001.00677, 2020.
  53. Adaptive risk minimization: A meta-learning approach for tackling group shift. arXiv preprint arXiv:2007.02931, 2020.
  54. Link prediction based on graph neural networks. CoRR, abs/1802.09691, 2018. URL http://arxiv.org/abs/1802.09691.
  55. An end-to-end deep learning architecture for graph classification. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018. doi: 10.1609/aaai.v32i1.11782. URL https://ojs.aaai.org/index.php/AAAI/article/view/11782.
  56. Domain generalization with mixstyle. In International Conference on Learning Representations, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.