Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Well Do Deep Learning Models Capture Human Concepts? The Case of the Typicality Effect (2405.16128v1)

Published 25 May 2024 in cs.AI and cs.CL

Abstract: How well do representations learned by ML models align with those of humans? Here, we consider concept representations learned by deep learning models and evaluate whether they show a fundamental behavioral signature of human concepts, the typicality effect. This is the finding that people judge some instances (e.g., robin) of a category (e.g., Bird) to be more typical than others (e.g., penguin). Recent research looking for human-like typicality effects in language and vision models has focused on models of a single modality, tested only a small number of concepts, and found only modest correlations with human typicality ratings. The current study expands this behavioral evaluation of models by considering a broader range of language (N = 8) and vision (N = 10) model architectures. It also evaluates whether the combined typicality predictions of vision + LLM pairs, as well as a multimodal CLIP-based model, are better aligned with human typicality judgments than those of models of either modality alone. Finally, it evaluates the models across a broader range of concepts (N = 27) than prior studies. There were three important findings. First, LLMs better align with human typicality judgments than vision models. Second, combined language and vision models (e.g., AlexNet + MiniLM) better predict the human typicality data than the best-performing LLM (i.e., MiniLM) or vision model (i.e., ViT-Huge) alone. Third, multimodal models (i.e., CLIP ViT) show promise for explaining human typicality judgments. These results advance the state-of-the-art in aligning the conceptual representations of ML models and humans. A methodological contribution is the creation of a new image set for testing the conceptual alignment of vision models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. arrrlo.  (2022). Google-images-search: [python] search for image using google custom search api and resize & crop afterwards. https://github.com/arrrlo/Google-Images-Search. (Accessed: 2023-08-14)
  2. (1969). Category norms of verbal items in 56 categories a replication and extension of the connecticut category norms. Journal of Experimental Psychology Monographs, 80, 1-46.
  3. (2020). Capturing human categorization of natural images by combining deep networks and cognitive models. Nature communications, 11, 5418.
  4. (2021). From convolutional neural networks to models of higher‐level cognition (and back again). Annals of the New York Academy of Sciences, 1505, 55-78.
  5. (2022, 10 27). Transformer networks of human conceptual knowledge. Psychological Review.
  6. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877–1901.
  7. (2021). Category norms with a cross-sectional sample of adults in the united states: Consideration of cohort, age, and historical effects on semantic categories. Behavior research methods, 53(2), 898–917. doi: 10.3758/s13428-020-01454-9
  8. (2008). Exemplar by feature applicability matrices and other dutch normative data for semantic concepts. Behavior research methods, 40, 1030-1048.
  9. (2009). Imagenet: A large-scale hierarchical image database. In 2009 ieee conference on computer vision and pattern recognition (pp. 248–255).
  10. (2021). An image is worth 16x16 words: Transformers for image recognition at scale.
  11. (2015). Deep residual learning for image recognition.
  12. (2019). Can prediction-based distributional semantic models predict typicality? Quarterly Journal of Experimental Psychology, 72, 2084-2109. Retrieved from https://doi.org/10.1177/1747021819830949 doi: 10.1177/1747021819830949
  13. (2018). Densely connected convolutional networks.
  14. Krizhevsky, A.  (2014). One weird trick for parallelizing convolutional neural networks.
  15. (2019). Categorization, typicality, and shape similarity. In Proceedings of the sixteenth annual conference of the cognitive science society (pp. 520–524).
  16. (2019). Roberta: A robustly optimized bert pretraining approach.
  17. (2021). Swin transformer: Hierarchical vision transformer using shifted windows.
  18. (2022). A convnet for the 2020s.
  19. (2023). Large language models predict human sensory judgments across six modalities. arXiv preprint arXiv:2302.01308.
  20. (2013). Efficient estimation of word representations in vector space.
  21. (2021). Do language models learn typicality judgments from text? arXiv preprint arXiv:2105.02987.
  22. Murphy, G.  (2002). The big book of concepts. MIT press.
  23. OpenAI.  (2023a). Gpt-4 technical report.
  24. OpenAI.  (2023b). New and improved embedding model. https://openai.com/blog/new-and-improved-embedding-model. (Accessed: 2023-08-14)
  25. OPHoperHPO.  (2022). image-background-remove-tool: A tool for removing the background from images and video. https://github.com/OPHoperHPO/image-background-remove-tool. (Accessed: 2023-08-14)
  26. (2014, October). GloVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543). Doha, Qatar: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D14-1162 doi: 10.3115/v1/D14-1162
  27. (2018). Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive science, 42, 2648-2669.
  28. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748–8763).
  29. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1–67.
  30. (2010). Software framework for topic modelling with large corpora. https://radimrehurek.com/gensim/index.html. (Accessed: 2023-08-14)
  31. (2019). Sentence transformers: Multilingual sentence embeddings using bert / roberta / xlm-roberta & co. with pytorch. Retrieved from https://huggingface.co/sentence-transformers (Accessed: 2023-08-14)
  32. Rosch, E.  (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104(3), 192.
  33. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382-439.
  34. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the ieee conference on computer vision and pattern recognition (pp. 4510–4520).
  35. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  36. (2015). Very deep convolutional networks for large-scale image recognition.
  37. (2020). End-to-end deep prototype and exemplar models for predicting human behavior. arXiv preprint arXiv:2007.08723.
  38. (2020). Mpnet: Masked and permuted pre-training for language understanding.
  39. (2015). Rethinking the inception architecture for computer vision.
  40. (2021). Efficientnetv2: Smaller models and faster training. In International conference on machine learning (pp. 10096–10106).
  41. (2023). Llama: Open and efficient foundation language models.
  42. (2022). Typicality gradients in computer vision models. Proceedings of the Annual Meeting of the Cognitive Science Society, 44.
  43. (2004). Category norms: An updated and expanded version of the norms. Journal of Memory and Language, 50, 289-335.
  44. (2020). Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.
  45. (2020). Xlnet: Generalized autoregressive pretraining for language understanding.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Siddhartha K. Vemuri (1 paper)
  2. Raj Sanjay Shah (18 papers)
  3. Sashank Varma (12 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com