Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 67 tok/s
Gemini 2.5 Pro 59 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 114 tok/s Pro
Kimi K2 201 tok/s Pro
GPT OSS 120B 465 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Breathing Life Into Sketches Using Text-to-Video Priors (2311.13608v1)

Published 21 Nov 2023 in cs.CV, cs.GR, and cs.LG

Abstract: A sketch is one of the most intuitive and versatile tools humans use to convey their ideas visually. An animated sketch opens another dimension to the expression of ideas and is widely used by designers for a variety of purposes. Animating sketches is a laborious process, requiring extensive experience and professional design skills. In this work, we present a method that automatically adds motion to a single-subject sketch (hence, "breathing life into it"), merely by providing a text prompt indicating the desired motion. The output is a short animation provided in vector representation, which can be easily edited. Our method does not require extensive training, but instead leverages the motion prior of a large pretrained text-to-video diffusion model using a score-distillation loss to guide the placement of strokes. To promote natural and smooth motion and to better preserve the sketch's appearance, we model the learned motion through two components. The first governs small local deformations and the second controls global affine transformations. Surprisingly, we find that even models that struggle to generate sketch videos on their own can still serve as a useful backbone for animating abstract representations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (99)
  1. Keyframe-based tracking of rotoscoping and animation. ACM Trans. Graph., 23:584–591, 2004.
  2. Latent-shift: Latent diffusion with temporal shift for efficient text-to-video generation. arXiv preprint arXiv:2304.08477, 2023.
  3. Pleistocene cave art from sulawesi, indonesia. Nature, 514(7521):223–227, 2014.
  4. Stochastic variational video prediction. arXiv preprint arXiv:1710.11252, 2017.
  5. Style and abstraction in portrait sketching. ACM Trans. Graph., 32(4), 2013.
  6. Pixelor: a competitive sketching ai agent. so you think you can sketch? ACM Trans. Graph., 39:166:1–166:15, 2020.
  7. Doodleformer: Creative sketch drawing with transformers. ECCV, 2022.
  8. Align your latents: High-resolution video synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  9. Turning to the masters: Motion capturing cartoons. ACM Transactions on Graphics (TOG), 21(3):399–407, 2002.
  10. Improved conditional vrnns for video prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7608–7617, 2019.
  11. Learning to generate line drawings that convey geometry and semantics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7915–7925, 2022.
  12. Videocrafter1: Open diffusion models for high-quality video generation, 2023.
  13. Sketch-pix2seq: a model to generate sketches of multiple categories. ArXiv, abs/1709.04121, 2017.
  14. A sketching interface for articulated figure animation. In Acm siggraph 2006 courses, pages 15–es. 2006.
  15. K-sketch: A ’kinetic’ sketch pad for novice animators. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, page 413–422, New York, NY, USA, 2008. Association for Computing Machinery.
  16. Stochastic video generation with a learned prior. In International conference on machine learning, pages 1174–1183. PMLR, 2018.
  17. Toonsynth: example-based synthesis of hand-colored cartoon animations. ACM Transactions on Graphics (TOG), 37(4):1–11, 2018.
  18. How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH), 31(4):44:1–44:10, 2012.
  19. Structure and content-guided video synthesis with diffusion models. arXiv preprint arXiv:2302.03011, 2023.
  20. Drawing as a versatile cognitive tool. Nature Reviews Psychology, 2:556 – 568, 2023.
  21. Common object representations for visual production and recognition. Cognitive science, 42 8:2670–2698, 2018.
  22. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. CoRR, abs/2106.14843, 2021.
  23. An image is worth one word: Personalizing text-to-image generation using textual inversion. In The Eleventh International Conference on Learning Representations, 2022.
  24. Tokenflow: Consistent diffusion features for consistent video editing. arXiv preprint arxiv:2307.10373, 2023.
  25. Michael Gleicher. Motion path editing. In Proceedings of the 2001 Symposium on Interactive 3D Graphics, page 195–202, New York, NY, USA, 2001. Association for Computing Machinery.
  26. Ernst Hans Gombrich. The story of art. Phaidon London, 1995.
  27. Space-time sketching of character animation. ACM Transactions on Graphics (ToG), 34(4):1–10, 2015.
  28. A neural representation of sketch drawings. CoRR, abs/1704.03477, 2017.
  29. Latent video diffusion models for high-fidelity long video generation. 2022.
  30. Aaron Hertzmann. Why do line drawings work? a realism hypothesis. Perception, 49:439 – 451, 2020.
  31. Charactergan: Few-shot keypoint character animation and reposing. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1988–1997, 2022.
  32. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  33. Character animation from 2d pictures and 3d motion data. ACM Trans. Graph., 26(1):1–es, 2007.
  34. Make it move: controllable image-to-video generation with text descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18219–18228, 2022.
  35. Lamd: Latent motion diffusion for video generation, 2023.
  36. Path drawing for 3d walkthrough. In Proceedings of the 11th Annual ACM Symposium on User Interface Software and Technology, page 173–174, New York, NY, USA, 1998. Association for Computing Machinery.
  37. As-rigid-as-possible shape manipulation. ACM Trans. Graph., 24(3):1134–1141, 2005.
  38. Word-as-image for semantic typography. ACM Trans. Graph., 42(4), 2023.
  39. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. arXiv, 2022.
  40. Synthesizing human-like sketches from natural images using a conditional convolutional decoder. CoRR, abs/2003.07101, 2020.
  41. Draco: Bringing life to illustrations with kinetic textures. Conference on Human Factors in Computing Systems - Proceedings, 2014.
  42. Levon Khachatryan. Tex-an mesh: Textured and animatable human body mesh reconstruction from a single image. https://github.com/lev1khachatryan/Tex-An_Mesh, 2020.
  43. Text2video-zero: Text-to-image diffusion models are zero-shot video generators. arXiv preprint arXiv:2303.13439, 2023.
  44. Tivgan: Text to image to video generation with step-by-step evolutionary generator. IEEE Access, 8:153113–153122, 2020.
  45. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  46. ArtiSketch: A System for Articulated Sketch Modeling. Computer Graphics Forum, 2013.
  47. Photo-sketching: Inferring contour drawings from images, 2019.
  48. Differentiable vector graphics rasterization for editing and learning. ACM Trans. Graph. (Proc. SIGGRAPH Asia), 39(6):193:1–193:15, 2020.
  49. Videogen: A reference-guided latent diffusion approach for high definition text-to-video generation. arXiv preprint arXiv:2309.00398, 2023.
  50. Free-hand sketch synthesis with deformable stroke models. CoRR, abs/1510.02644, 2015.
  51. Video generation from text. In Proceedings of the AAAI conference on artificial intelligence, 2018.
  52. Sketch-bert: Learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6757–6766, 2020.
  53. Neural strokes: Stylized line drawing of 3d shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14204–14213, 2021.
  54. Videofusion: Decomposed diffusion models for high-quality video generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  55. Latent-nerf for shape-guided generation of 3d shapes and textures. arXiv preprint arXiv:2211.07600, 2022.
  56. Learning to draw: Emergent communication through sketching. Advances in Neural Information Processing Systems, 34:7153–7166, 2021.
  57. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision, pages 405–421. Springer, 2020.
  58. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  59. Interactive generation of human animation with deformable motion models. ACM Trans. Graph., 29(1), 2009.
  60. General virtual sketching framework for vector line art. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2021), 40(4):51:1–51:14, 2021.
  61. Learning deep sketch abstraction. CoRR, abs/1804.04804, 2018.
  62. Expanding language-image pretrained models for general video recognition. 2022.
  63. Conditional image-to-video generation with latent flow diffusion models, 2023.
  64. Differential blending for expressive sketch-based posing. In Proceedings of the 2013 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, New York, NY, USA, 2013. ACM.
  65. Sketch-Based Skeleton-Driven 2D Animation and Motion Capture, pages 164–181. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.
  66. To create what you tell: Generating videos from captions. In Proceedings of the 25th ACM international conference on Multimedia, pages 1789–1798, 2017.
  67. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations, 2022.
  68. Neural puppet: Generative layered cartoon characters. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3346–3356, 2020.
  69. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  70. Sketchformer: Transformer-based representation for sketched structure. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14141–14150, 2020.
  71. Runway. Gen-2: Text driven video generation. https://research.runwayml.com/gen2, 2023.
  72. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
  73. A method for animating children’s drawings of the human figure. ACM Transactions on Graphics, 42(3):1–15, 2023.
  74. Learning to sketch with shortcut cycle consistency, 2018.
  75. Live sketch: Video-driven dynamic deformation of static drawings. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pages 1–12, 2018.
  76. Any-to-any generation via composable diffusion. 2023.
  77. Motion doodles: an interface for sketching character motion. ACM Transactions on Graphics (ToG), 23(3):424–431, 2004.
  78. A good image generator is what you need for high-resolution video synthesis. In International Conference on Learning Representations, 2021.
  79. Clipascene: Scene sketching with different types and levels of abstraction. 2022a.
  80. Clipasso: Semantically-aware object sketching. ACM Trans. Graph., 41(4), 2022b.
  81. Video tooning. In ACM SIGGRAPH 2004 Papers, page 574–583, New York, NY, USA, 2004. Association for Computing Machinery.
  82. Modelscope text-to-video technical report. arXiv preprint arXiv:2308.06571, 2023a.
  83. Videocomposer: Compositional video synthesis with motion controllability. arXiv preprint arXiv:2306.02018, 2023b.
  84. Lavie: High-quality video generation with cascaded latent diffusion models. arXiv preprint arXiv:2309.15103, 2023c.
  85. Scaling autoregressive video models. arXiv preprint arXiv:1906.02634, 2019.
  86. Photo wake-up: 3d character animation from a single photo. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5901–5910, 2018.
  87. Photo wake-up: 3d character animation from a single photo. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5908–5917, 2019.
  88. Godiva: Generating open-domain videos from natural descriptions. arXiv preprint arXiv:2104.14806, 2021.
  89. Nüwa: Visual synthesis pre-training for neural visual world creation. In European conference on computer vision, pages 720–736. Springer, 2022.
  90. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, pages 1395–1403, 2015.
  91. Autocomplete hand-drawn animations. ACM Trans. Graph., 34(6), 2015.
  92. Energy-brushes: Interactive tools for illustrating stylized elemental dynamics. pages 755–766, 2016.
  93. Dynamicrafter: Animating open-domain images with video diffusion priors. arXiv preprint arXiv:2310.12190, 2023.
  94. Deep learning for free-hand sketch: A survey and a toolbox, 2020.
  95. Videogpt: Video generation using vq-vae and transformers. arXiv preprint arXiv:2104.10157, 2021.
  96. Unpaired portrait drawing generation via asymmetric cycle mapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8217–8225, 2020.
  97. Coca: Contrastive captioners are image-text foundation models. 2022.
  98. Magicvideo: Efficient video generation with latent diffusion models, 2023.
  99. Motionvideogan: A novel video generator based on the motion space learned from image pairs. IEEE Transactions on Multimedia, 2023.
Citations (13)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube