Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 91 tok/s
Gemini 3.0 Pro 46 tok/s Pro
Gemini 2.5 Flash 148 tok/s Pro
Kimi K2 170 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

A Unified and Interpretable Emotion Representation and Expression Generation (2404.01243v1)

Published 1 Apr 2024 in cs.CV

Abstract: Canonical emotions, such as happy, sad, and fearful, are easy to understand and annotate. However, emotions are often compound, e.g. happily surprised, and can be mapped to the action units (AUs) used for expressing emotions, and trivially to the canonical ones. Intuitively, emotions are continuous as represented by the arousal-valence (AV) model. An interpretable unification of these four modalities - namely, Canonical, Compound, AUs, and AV - is highly desirable, for a better representation and understanding of emotions. However, such unification remains to be unknown in the current literature. In this work, we propose an interpretable and unified emotion model, referred as C2A2. We also develop a method that leverages labels of the non-unified models to annotate the novel unified one. Finally, we modify the text-conditional diffusion models to understand continuous numbers, which are then used to generate continuous expressions using our unified emotion model. Through quantitative and qualitative experiments, we show that our generated images are rich and capture subtle expressions. Our work allows a fine-grained generation of expressions in conjunction with other textual inputs and offers a new label space for emotions at the same time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Covariance pooling for facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 367–374, 2018.
  2. ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers, 2022.
  3. Understanding the recognition of facial identity and facial expression. Facial Expression Recognition, pages 41–64, 2016.
  4. Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15):E1454–E1462, 2014.
  5. Ganmut: Learning interpretable conditional space for gamut of emotions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 568–577, 2021.
  6. Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 1971.
  7. W. V. Ekman, P.and Friesen. The facial actin coding system: a technique for the measurements of facial movements. 1978.
  8. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. Proceedings of the IEEE conference on computer vision and pattern recognition., 2018.a.
  9. Learning multi-dimensional edge feature-based au relation graph for facial action unit recognition. arXiv preprint arXiv:2205.01782, 2022.a.
  10. Ganimation: Anatomically-aware facial animation from a single image. Proceedings of the European conference on computer vision (ECCV)., 2018.b.
  11. Neural emotion director: Speech-preserving semantic control of facial expressions in” in-the-wild” videos. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition., 2022.b.
  12. The many moods of emotion. arXiv preprint arXiv:1810.13197, 2018.c.
  13. Training-free structured diffusion guidance for compositional text-to-image synthesis. arXiv preprint arXiv:2212.05032, 2022.
  14. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  15. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  16. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10696–10706, 2022.
  17. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing 10.1, pages 18–31., 2017.
  18. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, pages 6626–6637, 2017.
  19. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  20. Icface: Interpretable and controllable face reenactment using gans. Proceedings of the IEEE/CVF winter conference on applications of computer vision., 2020.
  21. Emotion-aware multi-view contrastive learning for facial emotion recognition. In European Conference on Computer Vision, pages 178–195. Springer, 2022.
  22. Social and emotional messages of smiling: An ethological approach. 1979.
  23. Multi-concept customization of text-to-image diffusion. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023.
  24. Promises and problems with the circumplex model of emotion. 1992.
  25. Aligning text-to-image models using human feedback. arXiv preprint arXiv:2302.12192, 2023.
  26. Compositional visual generation with composable diffusion models. In European Conference on Computer Vision, pages 423–439. Springer, 2022.
  27. Lego: Learning to disentangle and invert concepts beyond object appearance in text-to-image diffusion models, 2023.
  28. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021a.
  29. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021b.
  30. Facial action coding system. 1978.
  31. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  32. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  33. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  34. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022a.
  35. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022b.
  36. Barbara H Rosenwein. Problems and methods in the history of emotions. Passions in context, 1(1):1–32, 2010.
  37. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  38. James Russell. A circumplex model of affect. Journal of Personality and Social Psychology, 39:1161–1178, 1980.
  39. Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11:273–294, 1977.
  40. A cross-cultural study of a circumplex model of affect. Journal of Personality and Social Psychology, 57:848–856, 1989.
  41. Facial and vocal expressions of emotion. Annual Review of Psychology, 54:329–349, 2003.
  42. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  43. Klaus R Scherer et al. Psychological models of emotion. The neuropsychology of emotion, 137(3):137–162, 2000.
  44. Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556, 2014.
  45. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  46. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  47. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
  48. Emotional experience during human-computer interaction: A survey. International Journal of Human–Computer Interaction, pages 1–11, 2023.
  49. Uncovering the disentanglement capability in text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1900–1910, 2023.
  50. Gan inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3121–3138, 2022.
  51. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2(3):5, 2022.
  52. 4d facial expression diffusion model. arXiv preprint arXiv:2303.16611, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.