Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
107 tokens/sec
Gemini 2.5 Pro Premium
58 tokens/sec
GPT-5 Medium
29 tokens/sec
GPT-5 High Premium
25 tokens/sec
GPT-4o
101 tokens/sec
DeepSeek R1 via Azure Premium
84 tokens/sec
GPT OSS 120B via Groq Premium
478 tokens/sec
Kimi K2 via Groq Premium
213 tokens/sec
2000 character limit reached

Cinematographic Camera Diffusion Model (2402.16143v1)

Published 25 Feb 2024 in cs.GR

Abstract: Designing effective camera trajectories in virtual 3D environments is a challenging task even for experienced animators. Despite an elaborate film grammar, forged through years of experience, that enables the specification of camera motions through cinematographic properties (framing, shots sizes, angles, motions), there are endless possibilities in deciding how to place and move cameras with characters. Dealing with these possibilities is part of the complexity of the problem. While numerous techniques have been proposed in the literature (optimization-based solving, encoding of empirical rules, learning from real examples,...), the results either lack variety or ease of control. In this paper, we propose a cinematographic camera diffusion model using a transformer-based architecture to handle temporality and exploit the stochasticity of diffusion models to generate diverse and qualitative trajectories conditioned by high-level textual descriptions. We extend the work by integrating keyframing constraints and the ability to blend naturally between motions using latent interpolation, in a way to augment the degree of control of the designers. We demonstrate the strengths of this text-to-camera motion approach through qualitative and quantitative experiments and gather feedback from professional artists. The code and data are available at \URL{https://github.com/jianghd1996/Camera-control}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. François Alberta “The Cinematic Paradigm” In Journal 1895. 1.66, 2012, pp. 8–33
  2. Daniel Arijon “Grammar of the film language” Silman-James Press, 1991
  3. Tenglong Ao, Zeyi Zhang and Libin Liu “GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents” In ACM Trans. on Graphics New York, NY, USA: ACM, 2023 DOI: 10.1145/3592097
  4. “Autonomous drone cinematographer: Using artistic principles to create smooth, safe, occlusion-free trajectories for aerial filming” In Proceedings of the 2018 international symposium on experimental robotics, 2020, pp. 119–129 Springer
  5. “High-level features for movie style understanding” In ICCV 2021 Workshop on AI for creative video editing and understanding, 2021
  6. Marc Christie, Patrick Olivier and Jean-Marie Normand “Camera control in computer graphics” In Computer Graphics Forum 27.8, 2008, pp. 2197–2218 Wiley Online Library
  7. “Virtual cinematography director for interactive storytelling” In Proceedings of the International Conference on Advances in Computer Entertainment Technology, 2009, pp. 263–270
  8. “Diffusion Models Beat GANs on Image Synthesis” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 8780–8794
  9. “Diffusion models beat gans on image synthesis” In Advances in neural information processing systems 34, 2021, pp. 8780–8794
  10. “Directing cinematographic drones” In ACM Trans. on Graphics 37.3 ACM New York, NY, USA, 2018, pp. 1–18
  11. “Generative adversarial networks” In Communications of the ACM 63.11 ACM New York, NY, USA, 2020, pp. 139–144
  12. “Through-the-lens camera control” In Proceedings of the 19th annual conference on Computer graphics and interactive techniques, 1992, pp. 331–340
  13. “ACT: An Autonomous Drone Cinematography System for Action Scenes” In 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 7039–7046
  14. Jonathan Ho, Ajay Jain and Pieter Abbeel “Denoising diffusion probabilistic models” In Advances in Neural Information Processing Systems 33, 2020, pp. 6840–6851
  15. “Classifier-Free Diffusion Guidance” In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021
  16. “Video diffusion models” In arXiv preprint arXiv:2204.03458, 2022
  17. “Camera Keyframing with Style and Control” In ACM Trans. Graph. 40.6 New York, NY, USA: Association for Computing Machinery, 2021
  18. “Example-Driven Virtual Cinematography by Learning Camera Behaviors” In ACM Trans. Graph. 39.4 New York, NY, USA: Association for Computing Machinery, 2020
  19. Diederik P Kingma and Max Welling “Auto-encoding variational bayes” In arXiv preprint arXiv:1312.6114, 2013
  20. “Real-time camera planning for navigation in virtual environments” In Smart Graphics: 9th International Symposium, SG 2008, Rennes, France, August 27-29, 2008. Proceedings 9, 2008, pp. 118–129 Springer Berlin Heidelberg
  21. “Intuitive and Efficient Camera Control with the Toric Space” In ACM Trans. Graph. 34.4 New York, NY, USA: Association for Computing Machinery, 2015
  22. “Repaint: Inpainting using denoising diffusion probabilistic models” In CVPR, 2022, pp. 11461–11471
  23. “Computational video editing for dialogue-driven scenes.” In ACM Trans. Graph. 36.4, 2017, pp. 130–1
  24. “Nerf: Representing scenes as neural radiance fields for view synthesis” In Communications of the ACM 65.1 ACM New York, NY, USA, 2021, pp. 99–106
  25. “Real-time planning for automated multi-view drone cinematography” In ACM Transactions on Graphics (TOG) 36.4 ACM New York, NY, USA, 2017, pp. 1–10
  26. “Learning To Cut by Watching Movies” In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 6858–6868
  27. “DreamFusion: Text-to-3D using 2D Diffusion” In The Eleventh International Conference on Learning Representations, 2023
  28. “The One Where They Reconstructed 3D Humans and Environments in TV Shows” In Computer Vision – ECCV 2022 Cham: Springer Nature Switzerland, 2022, pp. 732–749
  29. “High-resolution image synthesis with latent diffusion models” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10684–10695
  30. “Hierarchical text-conditional image generation with clip latents” In arXiv preprint arXiv:2204.06125, 2022
  31. “Learning transferable visual models from natural language supervision” In International conference on machine learning, 2021, pp. 8748–8763 PMLR
  32. “Exploring the limits of transfer learning with a unified text-to-text transformer” In The Journal of Machine Learning Research 21.1 JMLRORG, 2020, pp. 5485–5551
  33. “A local-to-global approach to multi-modal movie scene segmentation” In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2020, pp. 10146–10155
  34. “Automatic lighting design using a perceptual quality metric” In Computer graphics forum 20.3, 2001, pp. 215–227 Wiley Online Library
  35. “Rethinking the inception architecture for computer vision” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826
  36. “Deep unsupervised learning using nonequilibrium thermodynamics” In International Conference on Machine Learning, 2015, pp. 2256–2265 PMLR
  37. “Human motion diffusion model” In ICLR International Conference on Learning Representations, 2023
  38. “JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16933–16942
  39. “Write-a-video: computational video montage from themed text.” In ACM Trans. Graph. 38.6, 2019, pp. 177–1
  40. Zhisheng Xiao, Karsten Kreis and Arash Vahdat “Tackling the Generative Learning Trilemma with Denoising Diffusion GANs” In International Conference on Learning Representations, 2021
  41. Ken Xu, James Stewart and Eugene Fiume “Constraint-based automatic placement for scene composition” In Graphics Interface 2, 2002, pp. 25–34
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.