CineMPC: A Fully Autonomous Drone Cinematography System Incorporating Zoom, Focus, Pose, and Scene Composition (2401.05272v1)
Abstract: We present CineMPC, a complete cinematographic system that autonomously controls a drone to film multiple targets recording user-specified aesthetic objectives. Existing solutions in autonomous cinematography control only the camera extrinsics, namely its position, and orientation. In contrast, CineMPC is the first solution that includes the camera intrinsic parameters in the control loop, which are essential tools for controlling cinematographic effects like focus, depth-of-field, and zoom. The system estimates the relative poses between the targets and the camera from an RGB-D image and optimizes a trajectory for the extrinsic and intrinsic camera parameters to film the artistic and technical requirements specified by the user. The drone and the camera are controlled in a nonlinear Model Predicted Control (MPC) loop by re-optimizing the trajectory at each time step in response to current conditions in the scene. The perception system of CineMPC can track the targets' position and orientation despite the camera effects. Experiments in a photorealistic simulation and with a real platform demonstrate the capabilities of the system to achieve a full array of cinematographic effects that are not possible without the control of the intrinsics of the camera. Code for CineMPC is implemented following a modular architecture in ROS and released to the community.
- R. Bonatti, W. Wang, C. Ho, A. Ahuja, M. Gschwindt, E. Camci, E. Kayacan, S. Choudhury, and S. Scherer, “Autonomous aerial cinematography in unstructured environments with learned artistic decision-making,” Journal of Field Robotics, vol. 37, no. 4, pp. 606–641, 2020.
- A. Alcántara, J. Capitán, R. Cunha, and A. Ollero, “Optimal trajectory planning for cinematography with multiple unmanned aerial vehicles,” Robotics and Autonomous Systems, vol. 140, p. 103778, 2021.
- Z. Lu and L. Cai, “Camera calibration method with focus-related intrinsic parameters based on the thin-lens model,” Optics Express, vol. 28, no. 14, pp. 20 858–20 878, 2020.
- B. Lewandowski, D. Seichter, T. Wengefeld, L. Pfennig, H. Drumm, and H.-M. Gross, “Deep orientation: Fast and robust upper body orientation estimation for mobile robotic applications,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 441–448.
- Y. Chen, Y. Tian, and M. He, “Monocular human pose estimation: A survey of deep learning-based methods,” Computer Vision and Image Understanding, vol. 192, p. 102897, 2020.
- T. Nägeli, S. Oberholzer, S. Plüss, J. Alonso-Mora, and O. Hilliges, “Flycon: real-time environment-independent multi-view human pose estimation with aerial vehicles,” ACM Transactions on Graphics (TOG), vol. 37, no. 6, pp. 1–14, 2018.
- P. Pueyo, E. Cristofalo, E. Montijano, and M. Schwager, “Cinemairsim: A camera-realistic robotics simulator for cinematographic purposes,” 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1186–1191, 2020.
- S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-fidelity visual and physical simulation for autonomous vehicles,” in Field and service robotics, 2018, pp. 621–635.
- P. Pueyo, E. Montijano, A. C. Murillo, and M. Schwager, “Cinempc: Controlling camera intrinsics and extrinsics for autonomous cinematography,” arXiv preprint arXiv:2104.03634, 2021.
- C. Gebhardt, B. Hepp, T. Nägeli, S. Stevšić, and O. Hilliges, “Airways: Optimization-based planning of quadrotor trajectories according to high-level user goals,” in Conference on Human Factors in Computing Systems, 2016, pp. 2508–2519.
- C. Gebhardt and O. Hilliges, “Wyfiwyg: investigating effective user support in aerial videography,” arXiv preprint arXiv:1801.05972, 2018.
- N. Joubert, M. Roberts, A. Truong, F. Berthouzoz, and P. Hanrahan, “An interactive tool for designing quadrotor camera shots,” ACM Transactions on Graphics, vol. 34, no. 6, pp. 1–11, 2015.
- Z. Lan, M. Shridhar, D. Hsu, and S. Zhao, “Xpose: Reinventing user interaction with flying cameras.” in Robotics: Science and Systems, 2017, pp. 1–9.
- H. Kang, H. Li, J. Zhang, X. Lu, and B. Benes, “Flycam: Multitouch gesture controlled drone gimbal photography,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3717–3724, 2018.
- F. R. Perona, M. J. F. Gallego, and J. M. P. Callejón, “An application for aesthetic quality assessment in photography with interpretability features,” Entropy, vol. 23, no. 11, p. 1389, 2021.
- L. Mai, H. Le, Y. Niu, and F. Liu, “Rule of thirds detection from photograph,” in 2011 IEEE international symposium on Multimedia. IEEE, 2011, pp. 91–96.
- Y. Fang, H. Zhu, Y. Zeng, K. Ma, and Z. Wang, “Perceptual quality assessment of smartphone photography,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3677–3686.
- L.-w. He, M. F. Cohen, and D. H. Salesin, “The virtual cinematographer: A paradigm for automatic real-time camera control and directing,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996, pp. 217–224.
- T.-Y. Li and X.-Y. Xiao, “An interactive camera planning system for automatic cinematographer,” in 11th International Multimedia Modelling Conference. IEEE, 2005, pp. 310–315.
- C. Lino, M. Christie, R. Ranon, and W. Bares, “The director’s lens: an intelligent assistant for virtual cinematography,” in Proceedings of the 19th ACM international conference on Multimedia, 2011, pp. 323–332.
- X. Xiong, J. Feng, and B. Zhou, “Automatic view finding for drone photography based on image aesthetic evaluation.” in VISIGRAPP (1: GRAPP), 2017, pp. 282–289.
- G. Rousseau, C. S. Maniu, S. Tebbani, M. Babel, and N. Martin, “Quadcopter-performed cinematographic flight plans using minimum jerk trajectories and predictive camera control,” in 2018 European Control Conference (ECC). IEEE, 2018, pp. 2897–2903.
- C. Gebhardt, S. Stevšić, and O. Hilliges, “Optimizing for aesthetically pleasing quadrotor camera motion,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1–11, 2018.
- Q. Galvane, C. Lino, M. Christie, J. Fleureau, F. Servant, F.-L. Tariolle, and P. Guillotel, “Directing cinematographic drones,” ACM Transactions on Graphics (TOG), vol. 37, no. 3, pp. 1–18, 2018.
- A. Ashtari, S. Stevšić, T. Nägeli, J.-C. Bazin, and O. Hilliges, “Capturing subjective first-person view shots with drones for automated cinematography,” ACM Transactions on Graphics, vol. 39, no. 5, pp. 1–14, 2020.
- C. Huang, Z. Yang, Y. Kong, P. Chen, X. Yang, and K.-T. T. Cheng, “Learning to capture a film-look video with a camera drone,” in 2019 international conference on robotics and automation (ICRA). IEEE, 2019, pp. 1871–1877.
- C. Huang, Y. Dang, P. Chen, X. Yang, and K.-T. T. Cheng, “One-shot imitation drone filming of human motion videos,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- N. Joubert, D. B. Goldman, F. Berthouzoz, M. Roberts, J. A. Landay, P. Hanrahan et al., “Towards a drone cinematographer: Guiding quadrotor cameras using visual composition principles,” arXiv preprint arXiv:1610.01691, 2016.
- T. Nägeli, J. Alonso-Mora, A. Domahidi, D. Rus, and O. Hilliges, “Real-time motion planning for aerial videography with dynamic obstacle avoidance and viewpoint optimization,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1696–1703, 2017.
- T. Nägeli, L. Meier, A. Domahidi, J. Alonso-Mora, and O. Hilliges, “Real-time planning for automated multi-view drone cinematography,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1–10, 2017.
- C. Huang, F. Gao, J. Pan, Z. Yang, W. Qiu, P. Chen, X. Yang, S. Shen, and K.-T. Cheng, “Act: An autonomous drone cinematography system for action scenes,” in 2018 ieee international conference on robotics and automation (icra). IEEE, 2018, pp. 7039–7046.
- A. Bucker, R. Bonatti, and S. Scherer, “Do you see what i see? coordinating multiple aerial cameras for robot cinematography,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 7972–7979.
- J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
- M. Baba, N. Asada, A. Oda, and T. Migita, “A thin lens based camera model for depth estimation from defocus and translation by zooming,” Proceedings of ICVI, pp. 274–281, 2002.
- Y. Liang, R. Ranade, S. Wang, D. Bai, and J. Lee, “The” vertigo effect” on your smartphone: Dolly zoom via single shot view synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 344–345.
- A. Wächter and L. T. Biegler, “On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming,” Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006.
- S. Zheng, Y. Wu, S. Jiang, C. Lu, and G. Gupta, “Deblur-yolo: Real-time object detection with efficient blind motion deblurring,” in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8.
- J. Redmon, “Darknet: Open source neural networks in c,” http://pjreddie.com/darknet/, 2013–2016.
- R. Penicka and D. Scaramuzza, “Minimum-time quadrotor waypoint flight in cluttered environments,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5719–5726, 2022.
- X. Yang, J. Chen, Y. Dang, H. Luo, Y. Tang, C. Liao, P. Chen, and K.-T. Cheng, “Fast depth prediction and obstacle avoidance on a monocular drone using probabilistic convolutional neural network,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 1, pp. 156–167, 2019.
- M. Gschwindt, E. Camci, R. Bonatti, W. Wang, E. Kayacan, and S. Scherer, “Can a robot become a movie director? learning artistic principles for aerial cinematography,” in 2019 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 1107–1114.
- H. Jiang, M. Christie, X. Wang, L. Liu, B. Wang, and B. Chen, “Camera keyframing with style and control,” ACM Transactions on Graphics (TOG), vol. 40, no. 6, pp. 1–13, 2021.
- R. Bonatti, A. Bucker, S. Scherer, M. Mukadam, and J. Hodgins, “Batteries, camera, action! learning a semantic control space for expressive robot cinematography,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 7302–7308.
- Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.