Do language-only models learn the same universal visual dimensions?
Determine whether neural networks trained exclusively on language data learn the same universal dimensions of natural scene representation as those learned by image-trained vision networks; here, universal dimensions refer to latent principal-component dimensions of image representations that are convergently learned across diverse vision architectures and tasks and align with human visual cortex responses.
References
An open question is whether networks trained on language data alone learn the same universal dimensions of natural scene representation as image-trained networks.
                — Universal dimensions of visual representation
                
                (2408.12804 - Chen et al., 23 Aug 2024) in Discussion, future work paragraph