Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image Reconstruction as a Tool for Feature Analysis (2506.07803v1)

Published 9 Jun 2025 in cs.CV

Abstract: Vision encoders are increasingly used in modern applications, from vision-only models to multimodal systems such as vision-LLMs. Despite their remarkable success, it remains unclear how these architectures represent features internally. Here, we propose a novel approach for interpreting vision features via image reconstruction. We compare two related model families, SigLIP and SigLIP2, which differ only in their training objective, and show that encoders pre-trained on image-based tasks retain significantly more image information than those trained on non-image tasks such as contrastive learning. We further apply our method to a range of vision encoders, ranking them by the informativeness of their feature representations. Finally, we demonstrate that manipulating the feature space yields predictable changes in reconstructed images, revealing that orthogonal rotations (rather than spatial transformations) control color encoding. Our approach can be applied to any vision encoder, shedding light on the inner structure of its feature space. The code and model weights to reproduce the experiments are available in GitHub.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Eduard Allakhverdov (2 papers)
  2. Dmitrii Tarasov (2 papers)
  3. Elizaveta Goncharova (10 papers)
  4. Andrey Kuznetsov (36 papers)