Faithful recovery from true latent features in text-guided reconstruction
Establish whether text-guided visual image reconstruction pipelines that use CLIP features and text-to-image diffusion models (e.g., Stable Diffusion or Versatile Diffusion) can faithfully recover original target images with high perceptual similarity when provided the true latent features of those images, thereby meeting the fundamental requirement for accurate reconstruction from brain activity.
References
To ensure that a visual image reconstruction method has the potential to faithfully reproduce an individual's perceived visual experiences, it is crucial that the method can recover the original images with a high degree of perceptual similarity when the neural translation from brain activity to latent features is perfect. However, it has been unclear whether recent text-guided reconstruction methods meet this fundamental requirement.