Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

APISR: Anime Production Inspired Real-World Anime Super-Resolution (2403.01598v2)

Published 3 Mar 2024 in eess.IV, cs.AI, and cs.CV

Abstract: While real-world anime super-resolution (SR) has gained increasing attention in the SR community, existing methods still adopt techniques from the photorealistic domain. In this paper, we analyze the anime production workflow and rethink how to use characteristics of it for the sake of the real-world anime SR. First, we argue that video networks and datasets are not necessary for anime SR due to the repetition use of hand-drawing frames. Instead, we propose an anime image collection pipeline by choosing the least compressed and the most informative frames from the video sources. Based on this pipeline, we introduce the Anime Production-oriented Image (API) dataset. In addition, we identify two anime-specific challenges of distorted and faint hand-drawn lines and unwanted color artifacts. We address the first issue by introducing a prediction-oriented compression module in the image degradation model and a pseudo-ground truth preparation with enhanced hand-drawn lines. In addition, we introduce the balanced twin perceptual loss combining both anime and photorealistic high-level features to mitigate unwanted color artifacts and increase visual clarity. We evaluate our method through extensive experiments on the public benchmark, showing our method outperforms state-of-the-art anime dataset-trained approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Mpeg-4 systems: overview. Signal Processing: Image Communication, 15(4-5):281–298, 2000.
  2. Matthew Baas. Danbooru2018 pretrained resnet models for pytorch. https://rf5.github.io, 2019. Accessed: DATE.
  3. Danbooru2019: A large-scale crowdsourced and tagged anime illustration dataset. Danbooru2017, 2019.
  4. Animediffusion: Anime face line drawing colorization via diffusion models. arXiv preprint arXiv:2303.11137, 2023.
  5. Diffusart: Enhancing line art colorization with conditional diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3485–3489, 2023.
  6. Basicvsr: The search for essential components in video super-resolution and beyond. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4947–4956, 2021.
  7. Investigating tradeoffs in real-world video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5962–5971, 2022.
  8. IQA-PyTorch: Pytorch toolbox for image quality assessment. [Online]. Available: https://github.com/chaofengc/IQA-PyTorch, 2022.
  9. Improving the perceptual quality of 2d animation interpolation. In European Conference on Computer Vision, pages 271–287. Springer, 2022.
  10. Panic-3d: Stylized single-view 3d reconstruction from portraits of anime characters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21068–21077, 2023.
  11. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  12. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
  13. Ic9600: A benchmark dataset for automatic image complexity assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  14. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  15. Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. arXiv preprint arXiv:2307.04725, 2023.
  16. A technical overview of av1. Proceedings of the IEEE, 109(9):1435–1462, 2021.
  17. Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer, 2016.
  18. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  19. Real-world super-resolution via kernel estimation and noise injection. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020a.
  20. Real-world super-resolution via kernel estimation and noise injection. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 466–467, 2020b.
  21. Scenimefy: Learning to craft anime scene via semi-supervised image-to-image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7357–7367, 2023.
  22. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711. Springer, 2016.
  23. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  24. Efficient and explicit modelling of image hierarchies for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18278–18289, 2023.
  25. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021.
  26. Mpeg-2 overview. MPEG Video Compression Standard, pages 171–186, 1996.
  27. No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing, 21(12):4695–4708, 2012a.
  28. Making a “completely blind” image quality analyzer. IEEE Signal processing letters, 20(3):209–212, 2012b.
  29. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
  30. Overview of the scalable video coding extension of the h. 264/avc standard. IEEE Transactions on circuits and systems for video technology, 17(9):1103–1120, 2007.
  31. Enhanced deep animation video interpolation. In 2022 IEEE International Conference on Image Processing (ICIP), pages 31–35. IEEE, 2022.
  32. Research on the webp image format. In Advanced graphic communications, packaging technology and materials, pages 271–277. Springer, 2016.
  33. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  34. Deep animation video interpolation in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6587–6595, 2021.
  35. Animerun: 2d animation visual correspondence from open source 3d movies. Advances in Neural Information Processing Systems, 35:18996–19007, 2022.
  36. Deep geometrized cartoon line inbetweening. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7291–7300, 2023.
  37. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3667–3676, 2020.
  38. Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on circuits and systems for video technology, 22(12):1649–1668, 2012.
  39. Learning data-driven vector-quantized degradation model for animation video super-resolution. arXiv preprint arXiv:2303.09826, 2023.
  40. Gregory K Wallace. The jpeg still picture compression standard. IEEE transactions on consumer electronics, 38(1):xviii–xxxiv, 1992.
  41. Vcisr: Blind single image super-resolution with video compression synthetic data. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4302–4312, 2024.
  42. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2555–2563, 2023a.
  43. Coloring anime line art videos with transformation region enhancement network. Pattern Recognition, 141:109562, 2023b.
  44. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8798–8807, 2018a.
  45. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018b.
  46. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021.
  47. Xdog: An extended difference-of-gaussians compendium including advanced image stylization. Computers & Graphics, 36(6):740–753, 2012.
  48. Animesr: Learning real-world super-resolution models for animation videos. arXiv preprint arXiv:2206.07038, 2022.
  49. Space-time video super-resolution using temporal profiles. In Proceedings of the 28th ACM International Conference on Multimedia, pages 664–672, 2020.
  50. Space-time distillation for video super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2113–2122, 2021.
  51. A dive into sam prior in image restoration. arXiv preprint arXiv:2305.13620, 2023a.
  52. Cutmib: Boosting light field super-resolution via multi-view image blending. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1672–1682, 2023b.
  53. A transformer-based model for super-resolution of anime image. Sensors, 22(21):8126, 2022.
  54. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022.
  55. Manga vectorization and manipulation with procedural simple screentone. IEEE transactions on visualization and computer graphics, 23(2):1070–1084, 2016.
  56. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 701–710, 2018.
  57. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021a.
  58. User-guided line art flat filling with split filling mechanism. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9889–9898, 2021b.
  59. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  60. Vectorizing cartoon animations. IEEE Transactions on Visualization and Computer Graphics, 15(4):618–629, 2009.
  61. Cartoon image processing: A survey. IJCV, 2022.
Citations (5)

Summary

  • The paper introduces a novel framework that enhances low-resolution anime images using a custom dataset and compression-aware degradation modeling.
  • It integrates hand-drawn line enhancement via XDoG-based edge detection to preserve critical visual details unique to anime.
  • Experimental results show superior performance over existing models with improved NIQE, MANIQA, and CLIPIQA scores while reducing training data needs.

An Analysis of "APISR: Anime Production Inspired Real-World Anime Super-Resolution"

The paper "APISR: Anime Production Inspired Real-World Anime Super-Resolution," authored by Boyang Wang et al., presents a comprehensive framework for enhancing low-resolution anime images to high-resolution formats, leveraging insights from anime production workflows. The authors critically examine existing super-resolution (SR) methods, which often adapt techniques from the photorealistic domain and explore domain-specific challenges unique to anime. The key contributions of the paper include a novel dataset curation pipeline, an enhanced image degradation model, a hand-drawn line enhancement strategy, and a balanced twin perceptual loss tailored for anime.

Data Curation Approach

One of the focal points of the paper is the introduction of the Anime Production-oriented Image (API) dataset. The authors diverge from the conventional approach of using video datasets by highlighting the redundancy inherent in sequential anime frames. Instead, they utilize an image-based pipeline to select the least compressed and most informative frames, exploiting the structure of video compression algorithms like H.264. This pipeline implements an Image Complexity Assessment (ICA) to identify and select high-quality, information-rich frames, enhancing the efficacy and robustness of the dataset. Rescaling anime images to a 720P format aligns with the original production resolutions, thereby preserving detail and visual quality.

Enhanced Degradation Model

The authors introduce a prediction-oriented compression module as part of the degradation model to simulate real-world video compression artifacts. This module improves the resilience of SR networks by simulating complex compressive degradations using single-image inputs rather than sequential frames. Additionally, a shuffled resize module is integrated into the degradation pipeline, providing a more robust representation of resizing artifacts common in real-world scenarios. These methods aim to create a more accurate and versatile degradation model tailored to the quirks of anime content.

Hand-Drawn Line Enhancement

A significant aspect of the proposed framework is the attentive enhancement of hand-drawn lines, which are pivotal to the visual integrity of anime. Traditional global sharpening methods are inadequate, often failing to differentiate between significant lines and noise. The authors propose an innovative approach using XDoG-based edge detection to extract and enhance faint hand-drawn lines. By merging these edges with the ground-truth images, they form a pseudo-ground truth that significantly enriches the network training process.

Balanced Twin Perceptual Loss

Addressing the unwanted color artifacts frequently introduced by Generative Adversarial Network (GAN)-based SR methods, the paper presents a balanced twin perceptual loss. Unlike conventional perceptual losses trained on photorealistic images, this hybrid employs features drawn from both photorealistic and anime-specific datasets, with a reweighted emphasis on early ResNet layers adapted for anime classification tasks. This approach mitigates color artifacts while preserving detail and overall visual fidelity.

Experimental Validation

The effectiveness of the proposed methods is demonstrated through rigorous quantitative and qualitative evaluations. The results show a significant performance improvement over state-of-the-art methods like AnimeSR and VQD-SR, with superior scores in no-reference metrics such as NIQE, MANIQA, and CLIPIQA. The authors report that APISR outperforms existing models while using only a fraction of the training data, underscoring the efficiency of their dataset curation technique and the robustness of their enhancements.

Practical and Theoretical Implications

Practically, APISR holds substantial potential for improving the quality of anime content in entertainment and commercial applications, offering high-quality viewing experiences and preserving cultural content. Theoretically, the paper's contributions build a bridge between domain-specific knowledge and advanced SR techniques, fostering a more nuanced understanding of how domain characteristics can be leveraged to refine machine learning models. The proposed improvements in dataset curation, degradation modeling, and perceptual loss formulation present a robust framework that can be adapted to other domains characterized by unique visual styles.

Future Directions

Future research can extend these methods by exploring adaptive learning techniques that dynamically adjust to varying levels of image complexity and degradation. Additionally, integrating multi-frame dependencies while preserving the efficiency of single-frame operations could further enhance video-based SR applications. Expanding the perceptual loss framework to include multi-modal features could also provide more holistic improvements in visual quality across diverse content types.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com