Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 167 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 106 tok/s Pro
Kimi K2 187 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding (2402.19009v2)

Published 29 Feb 2024 in cs.LG and cs.AI

Abstract: The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images. Existing model families, like variational autoencoders (VAEs), generative adversarial networks (GANs), autoregressive models, and (latent) diffusion models, generally excel in specific capabilities and data types but fall short in others. We introduce Generalized Encoding-Decoding Diffusion Probabilistic Models (EDDPMs) which integrate the core capabilities for broad applicability and enhanced performance. EDDPMs generalize the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding. Crucially, EDDPMs are compatible with the well-established diffusion model objective and training recipes, allowing effective learning of the encoder-decoder parameters jointly with diffusion. By choosing appropriate encoder/decoder (e.g., LLMs), EDDPMs naturally apply to different data types. Extensive experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks and the strong improvement over various existing models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Gpt-4 technical report, 2023.
  2. Hvae: A deep generative model via hierarchical variational auto-encoder for multi-view document modeling. Information Sciences, 623:40–55, 2023.
  3. Cold diffusion: Inverting arbitrary image transforms without noise. CoRR, abs/2208.09392, 2022. doi: 10.48550/arXiv.2208.09392.
  4. Generalization in nli: Ways (not) to go beyond simple heuristics, 2021.
  5. Generating sentences from a continuous space. In Goldberg, Y. and Riezler, S. (eds.), Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, August 11-12, 2016, pp.  10–21. ACL, 2016. doi: 10.18653/v1/k16-1002. URL https://doi.org/10.18653/v1/k16-1002.
  6. Language models are few-shot learners. ArXiv, abs/2005.14165, 2020. URL https://api.semanticscholar.org/CorpusID:218971783.
  7. Relso: A transformer-based model for latent space optimization and generation of proteins, 2022.
  8. Variational lossy autoencoder. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=BysvGP5ee.
  9. Inverting the generator of a generative adversarial network. IEEE Transactions on Neural Networks and Learning Systems, 30(7):1967–1974, 2019. doi: 10.1109/TNNLS.2018.2875194.
  10. A survey on text generation using generative adversarial networks. Pattern Recognition, 119:108098, 2021.
  11. BERT: pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T. (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, 2019. doi: 10.18653/v1/n19-1423. URL https://doi.org/10.18653/v1/n19-1423.
  12. Deep unsupervised clustering with gaussian mixture variational autoencoders. CoRR, abs/1611.02648, 2016. URL http://arxiv.org/abs/1611.02648.
  13. Adversarial feature learning. CoRR, abs/1605.09782, 2016. URL http://arxiv.org/abs/1605.09782.
  14. Adversarially learned inference. In ICLR (Poster). OpenReview.net, 2017.
  15. Taming transformers for high-resolution image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp.  12873–12883. Computer Vision Foundation / IEEE, 2021. doi: 10.1109/CVPR46437.2021.01268.
  16. Deep learning, volume 1. MIT Press, 2016.
  17. Generative adversarial nets. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., and Weinberger, K. Q. (eds.), Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp.  2672–2680, 2014.
  18. Nonparametric variational auto-encoders for hierarchical representation learning. In Proceedings of the IEEE International Conference on Computer Vision, pp.  5094–5102, 2017.
  19. Protein design with guided discrete diffusion, 2023.
  20. Diffusionbert: Improving generative masked language models with diffusion models, 2022.
  21. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 6626–6637, 2017.
  22. beta-VAE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Sy2fzU9gl.
  23. Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  24. Toward a ’Standard Model’ of Machine Learning. Harvard Data Science Review, 4(4), oct 27 2022. https://hdsr.mitpress.mit.edu/pub/zkib7xth.
  25. Toward controlled generation of text. In International conference on machine learning, pp. 1587–1596. PMLR, 2017.
  26. On unifying deep generative models. In International Conference on Learning Representations, 2018.
  27. Progressive growing of gans for improved quality, stability, and variation. In ICLR. OpenReview.net, 2018.
  28. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  29. Analyzing and improving the image quality of stylegan. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 8107–8116. Computer Vision Foundation / IEEE, 2020a. doi: 10.1109/CVPR42600.2020.00813.
  30. Analyzing and improving the image quality of stylegan. In CVPR, pp.  8107–8116. Computer Vision Foundation / IEEE, 2020b.
  31. Auto-encoding variational bayes. In Bengio, Y. and LeCun, Y. (eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
  32. Gans for sequences of discrete elements with the gumbel-softmax distribution, 2016.
  33. Autoencoding beyond pixels using a learned similarity metric. In Balcan, M. and Weinberger, K. Q. (eds.), Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pp.  1558–1566. JMLR.org, 2016.
  34. ALICE: towards understanding adversarial learning for joint distribution matching. In NIPS, pp.  5495–5503, 2017.
  35. Optimus: Organizing sentences via pre-trained modeling of a latent space. In Webber, B., Cohn, T., He, Y., and Liu, Y. (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pp. 4678–4699. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.emnlp-main.378. URL https://doi.org/10.18653/v1/2020.emnlp-main.378.
  36. Delete, retrieve, generate: A simple approach to sentiment and style transfer. 2018.
  37. Diffusion-lm improves controllable text generation, 2022.
  38. Text generation with diffusion language models: A pre-training approach with continuous paragraph denoise, 2023.
  39. Antibody complementarity determining region design using high-capacity machine learning. Bioinformatics, 36(7):2126–2133, 11 2019. ISSN 1367-4803. doi: 10.1093/bioinformatics/btz895.
  40. Composable text controls in latent space with ODEs. In Bouamor, H., Pino, J., and Bali, K. (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  16543–16570, Singapore, December 2023a. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.1030.
  41. An improved hierarchical variational autoencoder for cell–cell communication estimation using single-cell rna-seq data. Briefings in Functional Genomics, pp.  elac056, 2023b.
  42. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp. 3431–3440. IEEE Computer Society, 2015. doi: 10.1109/CVPR.2015.7298965. URL https://doi.org/10.1109/CVPR.2015.7298965.
  43. Mauve: Measuring the gap between neural text and human text using divergence frontiers. In NeurIPS, 2021.
  44. Diffusion autoencoders: Toward a meaningful and decodable representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 10609–10619. IEEE, 2022. doi: 10.1109/CVPR52688.2022.01036.
  45. Cold decoding: Energy-based constrained text generation with langevin dynamics. Advances in Neural Information Processing Systems, 35:9538–9551, 2022.
  46. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  47. Generating diverse high-fidelity images with VQ-VAE-2. In Wallach, H. M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E. B., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp.  14837–14847, 2019.
  48. High-resolution image synthesis with latent diffusion models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10674–10685, 2021. URL https://api.semanticscholar.org/CorpusID:245335280.
  49. Local fitness landscape of the green fluorescent protein. Nature, 533:397–401, 2016.
  50. Stylegan-xl: Scaling stylegan to large diverse datasets. In SIGGRAPH (Conference Paper Track), pp.  49:1–49:10. ACM, 2022.
  51. Style transfer from non-parallel text by cross-alignment. 2017.
  52. Educating text autoencoders: Latent representation guidance via denoising. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp.  8719–8729. PMLR, 2020.
  53. Shoemake, K. Animating rotation with quaternion curves. In Cole, P., Heilman, R., and Barsky, B. A. (eds.), Proceedings of the 12th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1985, San Francisco, California, USA, July 22-26, 1985, pp.  245–254. ACM, 1985. doi: 10.1145/325334.325242. URL https://doi.org/10.1145/325334.325242.
  54. Deep unsupervised learning using nonequilibrium thermodynamics. In Bach, F. R. and Blei, D. M. (eds.), Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pp.  2256–2265. JMLR.org, 2015.
  55. Denoising diffusion implicit models. In ICLR. OpenReview.net, 2021a.
  56. Score-based generative modeling through stochastic differential equations, 2021b.
  57. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  58. VAE with a vampprior. In Storkey, A. J. and Pérez-Cruz, F. (eds.), International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, volume 84 of Proceedings of Machine Learning Research, pp. 1214–1223. PMLR, 2018. URL http://proceedings.mlr.press/v84/tomczak18a.html.
  59. NVAE: A deep hierarchical variational autoencoder. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020a.
  60. NVAE: A deep hierarchical variational autoencoder. In NeurIPS, 2020b.
  61. Score-based generative modeling in latent space. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp.  11287–11302, 2021.
  62. Diffusion priors in variational autoencoders. CoRR, abs/2106.15671, 2021. URL https://arxiv.org/abs/2106.15671.
  63. Ar-diffusion: Auto-regressive diffusion model for text generation, 2023.
  64. Improving GAN training with probability ratio clipping and sample reweighting. Advances in Neural Information Processing Systems, 33:5729–5740, 2020.
  65. GAN inversion: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 45(3):3121–3138, 2023a. doi: 10.1109/TPAMI.2022.3181070. URL https://doi.org/10.1109/TPAMI.2022.3181070.
  66. GAN inversion: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 45(3):3121–3138, 2023b.
  67. NVAE-GAN based approach for unsupervised time series anomaly detection. CoRR, abs/2101.02908, 2021.
  68. Improved variational autoencoders for text modeling using dilated convolutions. In International conference on machine learning, pp. 3881–3890. PMLR, 2017.
  69. Dinoiser: Diffused conditional sequence learning by manipulating noises, 2023.
  70. LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. CoRR, abs/1506.03365, 2015.
  71. Seqgan: Sequence generative adversarial nets with policy gradient, 2017.
  72. Seqdiffuseq: Text diffusion with encoder-decoder transformers, 2023.
  73. Unsupervised representation adversarial learning network: from reconstruction to generation. In International Joint Conference on Neural Networks, IJCNN 2019 Budapest, Hungary, July 14-19, 2019, pp.  1–8. IEEE, 2019. doi: 10.1109/IJCNN.2019.8852395. URL https://doi.org/10.1109/IJCNN.2019.8852395.
  74. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In The IEEE International Conference on Computer Vision (ICCV), December 2015.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 36 likes.

Upgrade to Pro to view all of the tweets about this paper: