Controllable Mind Visual Diffusion Model (2305.10135v3)
Abstract: Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models. Although diffusion models have shown promise in analyzing functional magnetic resonance imaging (fMRI) data, including reconstructing high-quality images consistent with original visual stimuli, their accuracy in extracting semantic and silhouette information from brain signals remains limited. In this regard, we propose a novel approach, referred to as Controllable Mind Visual Diffusion Model (CMVDM). CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks. Additionally, a residual block is incorporated to capture information beyond semantic and silhouette features. We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette. Through extensive experimentation, we demonstrate that CMVDM outperforms existing state-of-the-art methods both qualitatively and quantitatively.
- Brain decoding of viewed image categories via semi-supervised multi-view Bayesian generative model. IEEE Transactions on Signal Processing.
- RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation. arXiv:2211.09869.
- Label-efficient semantic segmentation with diffusion models. In ICLR.
- From voxels to pixels and back: Self-supervision in natural-image reconstruction from fMRI. In NeurIPS.
- BOLD5000, a public fMRI dataset while viewing 5000 visual images. Scientific data.
- Diffusiondet: Diffusion model for object detection. arXiv:2211.09788.
- Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In CVPR.
- Decoding the representation of numerical values from brain activation patterns. Human Brain Mapping.
- Imagenet: A large-scale hierarchical image database. In CVPR.
- Diffusion models beat gans on image synthesis. In NeurIPS.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
- Implicit Diffusion Models for Continuous Super-Resolution. arXiv preprint arXiv:2303.16491.
- Self-supervised natural image reconstruction and large-scale semantic classification from brain activity. NeuroImage.
- Imagen video: High definition video generation with diffusion models. arXiv:2210.02303.
- Denoising diffusion probabilistic models. In NeurIPS.
- Video diffusion models. In NeurIPS.
- Generic decoding of seen and imagined objects using hierarchical visual features. Nature Communications.
- Identifying natural images from human brain activity. Nature.
- Adam: A method for stochastic optimization. In ICLR.
- Diffwave: A versatile diffusion model for audio synthesis. arXiv:2009.09761.
- Individual faces elicit distinct response patterns in human anterior temporal cortex. Proceedings of the National Academy of Sciences, 104(51): 20600–20605.
- SinDDM: A Single Image Denoising Diffusion Model. arXiv:2211.16582.
- Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing.
- Diffusion-SDF: Text-to-Shape via Voxelized Diffusion. arXiv:2212.03293.
- Magic3D: High-Resolution Text-to-3D Content Creation. arXiv:2211.10440.
- Microsoft coco: Common objects in context. In ECCV.
- Diffsinger: Diffusion acoustic model for singing voice synthesis. arXiv:2105.02446.
- Decoupled weight decay regularization. In ICLR.
- Diffusion probabilistic models for 3d point cloud generation. In CVPR.
- Stable long-term BCI-enabled communication in ALS and locked-in syndrome using LFP signals. Journal of Neurophysiology, 120(7): 343–360.
- Miller, G. A. 1998. WordNet: An electronic lexical database. MIT press.
- Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron, 60(5): 915–929.
- Bayesian reconstruction of natural images from human brain activity. Neuron, 63(6): 902–915.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741.
- Brain computer interfaces, a review. Sensors, 12(2): 1211–1279.
- Reconstructing visual experiences from brain activity evoked by natural movies. Current biology, 21(19): 1641–1646.
- Reconstruction of perceived images from fMRI patterns and semantic brain exploration using instance-conditioned GANs. In IJCNN.
- Scalable Diffusion Models with Transformers. arXiv:2212.09748.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv:2209.14988.
- Learning transferable visual models from natural language supervision. In ICML.
- High-resolution image synthesis with latent diffusion models. In CVPR.
- Image super-resolution via iterative refinement. TPAMI.
- Linear reconstruction of perceived images from human brain activity. NeuroImage, 83: 951–961.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML.
- Denoising diffusion implicit models. arXiv:2010.02502.
- High-resolution image reconstruction with latent diffusion models from human brain activity. In CVPR.
- Human motion diffusion model. arXiv:2209.14916.
- Score-based generative modeling in latent space. In NeurIPS.
- The WU-Minn human connectome project: an overview. NeuroImage.
- Efficient Bayesian multivariate fMRI analysis using a sparsifying spatio-temporal prior. NeuroImage.
- Neural decoding with hierarchical generative models. Neural Computation, 22(12): 3127–3142.
- SinDiffusion: Learning a Diffusion Model from a Single Natural Image. arXiv:2211.12445.
- Image quality assessment: from error visibility to structural similarity. TIP.
- Sun database: Large-scale scene recognition from abbey to zoo. In CVPR.
- Tackling the Generative Learning Trilemma with Denoising Diffusion GANs. In ICLR.
- Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23): 8619–8624.
- Face Animation with an Attribute-Guided Diffusion Model. arXiv preprint arXiv:2304.03199.
- Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543.