Cinematographic Camera Diffusion Model (2402.16143v1)
Abstract: Designing effective camera trajectories in virtual 3D environments is a challenging task even for experienced animators. Despite an elaborate film grammar, forged through years of experience, that enables the specification of camera motions through cinematographic properties (framing, shots sizes, angles, motions), there are endless possibilities in deciding how to place and move cameras with characters. Dealing with these possibilities is part of the complexity of the problem. While numerous techniques have been proposed in the literature (optimization-based solving, encoding of empirical rules, learning from real examples,...), the results either lack variety or ease of control. In this paper, we propose a cinematographic camera diffusion model using a transformer-based architecture to handle temporality and exploit the stochasticity of diffusion models to generate diverse and qualitative trajectories conditioned by high-level textual descriptions. We extend the work by integrating keyframing constraints and the ability to blend naturally between motions using latent interpolation, in a way to augment the degree of control of the designers. We demonstrate the strengths of this text-to-camera motion approach through qualitative and quantitative experiments and gather feedback from professional artists. The code and data are available at \URL{https://github.com/jianghd1996/Camera-control}.
- François Alberta “The Cinematic Paradigm” In Journal 1895. 1.66, 2012, pp. 8–33
- Daniel Arijon “Grammar of the film language” Silman-James Press, 1991
- Tenglong Ao, Zeyi Zhang and Libin Liu “GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents” In ACM Trans. on Graphics New York, NY, USA: ACM, 2023 DOI: 10.1145/3592097
- “Autonomous drone cinematographer: Using artistic principles to create smooth, safe, occlusion-free trajectories for aerial filming” In Proceedings of the 2018 international symposium on experimental robotics, 2020, pp. 119–129 Springer
- “High-level features for movie style understanding” In ICCV 2021 Workshop on AI for creative video editing and understanding, 2021
- Marc Christie, Patrick Olivier and Jean-Marie Normand “Camera control in computer graphics” In Computer Graphics Forum 27.8, 2008, pp. 2197–2218 Wiley Online Library
- “Virtual cinematography director for interactive storytelling” In Proceedings of the International Conference on Advances in Computer Entertainment Technology, 2009, pp. 263–270
- “Diffusion Models Beat GANs on Image Synthesis” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 8780–8794
- “Diffusion models beat gans on image synthesis” In Advances in neural information processing systems 34, 2021, pp. 8780–8794
- “Directing cinematographic drones” In ACM Trans. on Graphics 37.3 ACM New York, NY, USA, 2018, pp. 1–18
- “Generative adversarial networks” In Communications of the ACM 63.11 ACM New York, NY, USA, 2020, pp. 139–144
- “Through-the-lens camera control” In Proceedings of the 19th annual conference on Computer graphics and interactive techniques, 1992, pp. 331–340
- “ACT: An Autonomous Drone Cinematography System for Action Scenes” In 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 7039–7046
- Jonathan Ho, Ajay Jain and Pieter Abbeel “Denoising diffusion probabilistic models” In Advances in Neural Information Processing Systems 33, 2020, pp. 6840–6851
- “Classifier-Free Diffusion Guidance” In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021
- “Video diffusion models” In arXiv preprint arXiv:2204.03458, 2022
- “Camera Keyframing with Style and Control” In ACM Trans. Graph. 40.6 New York, NY, USA: Association for Computing Machinery, 2021
- “Example-Driven Virtual Cinematography by Learning Camera Behaviors” In ACM Trans. Graph. 39.4 New York, NY, USA: Association for Computing Machinery, 2020
- Diederik P Kingma and Max Welling “Auto-encoding variational bayes” In arXiv preprint arXiv:1312.6114, 2013
- “Real-time camera planning for navigation in virtual environments” In Smart Graphics: 9th International Symposium, SG 2008, Rennes, France, August 27-29, 2008. Proceedings 9, 2008, pp. 118–129 Springer Berlin Heidelberg
- “Intuitive and Efficient Camera Control with the Toric Space” In ACM Trans. Graph. 34.4 New York, NY, USA: Association for Computing Machinery, 2015
- “Repaint: Inpainting using denoising diffusion probabilistic models” In CVPR, 2022, pp. 11461–11471
- “Computational video editing for dialogue-driven scenes.” In ACM Trans. Graph. 36.4, 2017, pp. 130–1
- “Nerf: Representing scenes as neural radiance fields for view synthesis” In Communications of the ACM 65.1 ACM New York, NY, USA, 2021, pp. 99–106
- “Real-time planning for automated multi-view drone cinematography” In ACM Transactions on Graphics (TOG) 36.4 ACM New York, NY, USA, 2017, pp. 1–10
- “Learning To Cut by Watching Movies” In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 6858–6868
- “DreamFusion: Text-to-3D using 2D Diffusion” In The Eleventh International Conference on Learning Representations, 2023
- “The One Where They Reconstructed 3D Humans and Environments in TV Shows” In Computer Vision – ECCV 2022 Cham: Springer Nature Switzerland, 2022, pp. 732–749
- “High-resolution image synthesis with latent diffusion models” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10684–10695
- “Hierarchical text-conditional image generation with clip latents” In arXiv preprint arXiv:2204.06125, 2022
- “Learning transferable visual models from natural language supervision” In International conference on machine learning, 2021, pp. 8748–8763 PMLR
- “Exploring the limits of transfer learning with a unified text-to-text transformer” In The Journal of Machine Learning Research 21.1 JMLRORG, 2020, pp. 5485–5551
- “A local-to-global approach to multi-modal movie scene segmentation” In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2020, pp. 10146–10155
- “Automatic lighting design using a perceptual quality metric” In Computer graphics forum 20.3, 2001, pp. 215–227 Wiley Online Library
- “Rethinking the inception architecture for computer vision” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826
- “Deep unsupervised learning using nonequilibrium thermodynamics” In International Conference on Machine Learning, 2015, pp. 2256–2265 PMLR
- “Human motion diffusion model” In ICLR International Conference on Learning Representations, 2023
- “JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16933–16942
- “Write-a-video: computational video montage from themed text.” In ACM Trans. Graph. 38.6, 2019, pp. 177–1
- Zhisheng Xiao, Karsten Kreis and Arash Vahdat “Tackling the Generative Learning Trilemma with Denoising Diffusion GANs” In International Conference on Learning Representations, 2021
- Ken Xu, James Stewart and Eugene Fiume “Constraint-based automatic placement for scene composition” In Graphics Interface 2, 2002, pp. 25–34