Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding (2402.08919v1)

Published 14 Feb 2024 in cs.CV and cs.LG

Abstract: Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning. In legal doctrine however, determining the degree of similarity between works requires subjective analysis, and fact-finders (judges and juries) can demonstrate considerable variability in these subjective judgement calls. Images that are structurally similar can be deemed dissimilar, whereas images of completely different scenes can be deemed similar enough to support a claim of copying. We seek to define and compute a notion of "conceptual similarity" among images that captures high-level relations even among images that do not share repeated elements or visually similar components. The idea is to use a base multi-modal model to generate "explanations" (captions) of visual data at increasing levels of complexity. Then, similarity can be measured by the length of the caption needed to discriminate between the two images: Two highly dissimilar images can be discriminated early in their description, whereas conceptually dissimilar ones will need more detail to be distinguished. We operationalize this definition and show that it correlates with subjective (averaged human evaluation) assessment, and beats existing baselines on both image-to-image and text-to-text similarity benchmarks. Beyond just providing a number, our method also offers interpretability by pointing to the specific level of granularity of the description where the source data are differentiated.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Semeval-2012 task 6: A pilot on semantic textual similarity. In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 385–393, 2012.
  2. * sem 2013 shared task: Semantic textual similarity. In Second joint conference on lexical and computational semantics (* SEM), volume 1: proceedings of the Main conference and the shared task: semantic textual similarity, pages 32–43, 2013.
  3. Semeval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pages 81–91, 2014.
  4. Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), pages 252–263, 2015.
  5. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In SemEval-2016. 10th International Workshop on Semantic Evaluation; 2016 Jun 16-17; San Diego, CA. Stroudsburg (PA): ACL; 2016. p. 497-511. ACL (Association for Computational Linguistics), 2016.
  6. Semantic bottleneck for computer vision tasks. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part II 14, pages 695–712. Springer, 2019.
  7. Understanding disentangling in beta-vae. arXiv preprint arXiv:1804.03599, 2018.
  8. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055, 2017.
  9. Testing relational understanding in text-guided image generation. arXiv preprint arXiv:2208.00005, 2022.
  10. Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.
  11. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  12. Algorithmic statistics. IEEE Transactions on Information Theory, 47(6):2443–2463, 2001.
  13. Datacomp: In search of the next generation of multimodal datasets. arXiv preprint arXiv:2304.14108, 2023.
  14. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821, 2021.
  15. Towards conceptual compression. Advances In Neural Information Processing Systems, 29, 2016.
  16. The communication complexity of correlation. In Twenty-Second Annual IEEE Conference on Computational Complexity (CCC’07), pages 10–23. IEEE, 2007.
  17. beta-vae: Learning basic visual concepts with a constrained variational framework. International conference on learning representations, 2016.
  18. Sugarcrepe: Fixing hackable benchmarks for vision-language compositionality. arXiv preprint arXiv:2306.14610, 2023.
  19. Promptbert: Improving bert sentence embeddings with prompts. arXiv preprint arXiv:2201.04337, 2022.
  20. Scaling sentence embeddings with large language models. arXiv preprint arXiv:2307.16645, 2023.
  21. Extensions of Lipshitz mapping into Hilbert space. 1984.
  22. Concept bottleneck models. In International conference on machine learning, pages 5338–5348. PMLR, 2020.
  23. Interpretable diffusion via information decomposition. arXiv preprint arXiv:2310.07972, 2023.
  24. An introduction to Kolmogorov complexity and its applications. Springer, 2008.
  25. Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023a.
  26. Unsupervised compositional concepts discovery with text-to-image generative models. arXiv preprint arXiv:2306.05357, 2023b.
  27. Meaning representations from trajectories in autoregressive models. arXiv preprint arXiv:2310.18348, 2023c.
  28. Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pages 1–8, 2014.
  29. Falcon: fast visual concept learning by integrating images, linguistic descriptions, and conceptual relations. arXiv preprint arXiv:2203.16639, 2022.
  30. Niklas Muennighoff. Sgpt: Gpt sentence embeddings for semantic search. arXiv preprint arXiv:2202.08904, 2022.
  31. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877, 2021.
  32. Crisscrossed captions: Extended intramodal and intermodal semantic similarity judgments for ms-coco. arXiv preprint arXiv:2004.15020, 2020.
  33. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  34. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
  35. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022.
  36. Steinberg v. Columbia Pictures Industries, Inc. 663 F. Supp. 706, (S.D.N.Y. 1987).
  37. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  38. Kolmogorov’s structure functions and model selection. IEEE Transactions on Information Theory, 50(12):3265–3290, 2004.
  39. Alignment by maximization of mutual information. International journal of computer vision, 24(2):137–154, 1997.
  40. Perplexity from plm is unreliable for evaluating text quality. arXiv preprint arXiv:2210.05892, 2022.
  41. When and why vision-language models behave like bags-of-words, and what to do about it? In The Eleventh International Conference on Learning Representations, 2022.
  42. An unsupervised sentence embedding method by mutual information maximization. arXiv preprint arXiv:2009.12061, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com