Papers
Topics
Authors
Recent
Search
2000 character limit reached

Anime Production Image (API) Dataset

Updated 5 March 2026
  • Anime Production-oriented Image (API) Dataset is a curated corpus of keyframes, optimized for super-resolution tasks with minimal artifacts.
  • It employs a multi-stage pipeline, including I-Frame extraction, image complexity assessment, and resolution standardization aligned with anime production workflows.
  • The dataset integrates a novel degradation model and pseudo-ground truth enhancement, achieving superior performance on metrics like NIQE, MANIQA, and CLIP-IQA.

The Anime Production-oriented Image (API) Dataset is a curated corpus of high-information, artifact-minimized frames specifically constructed to address real-world anime super-resolution (SR) tasks. API distinguishes itself by aligning dataset construction tightly with anime production workflows, focusing on the peculiarities of hand-drawn frame repetition, unique degradation artifacts, and the visual semantics central to animation. Unlike photorealistic datasets or generic anime video frame dumps, API's design reflects the structural and visual constraints of digital anime, supporting state-of-the-art model training for SR tasks and benchmarking performance in operationally realistic scenarios (Wang et al., 2024).

1. Collection Pipeline Informed by Anime Production

The API dataset is compiled through a multi-stage process that leverages production conventions in anime:

  • I-Frame Extraction: Exploiting the prevalence of repeated hand-drawn frames across video sequences, only independently encoded I-Frames (“intra” frames) are extracted from source materials using ffmpeg. I-Frames typically retain substantially higher fidelity (estimated 2–3× the file size and fewer temporal or compression artifacts) compared to P- and B-frames subjected to inter-frame prediction and heavier compression.
  • Image Complexity Assessment (ICA): Rather than relying on traditional image quality assessment (IQA) metrics such as NIQE or HyperIQA—which tend to favor low-detail, smooth frames—the IC9600 neural network is employed to quantify the density and diversity of lines, textures, and computer-generated (CGI) effects. For each pool of I-Frames per video, the top 10 frames with the highest ICA scores are selected, after discarding content with nudity, excessive violence, or mixed photorealistic overlays. This step concentrates the dataset on visually informative, production-critical images.
  • Resolution Standardization: Recognition that most anime is produced natively at 720p is critical. End-user releases are often upscaled to 1080p or higher, introducing artificial interpolation artifacts. All selected frames are rescaled to 1280×720 (H×W) using a nearest-neighbor and convolution sequence to preserve sharp hand-drawn lines. This step removes scale-induced degradations while retaining original stylistic intent.

The end-to-end pipeline thus yields a dataset with a strong production logic, minimizing redundant or low-detail frames and targeting the optimal balance between quantity and visual utility (Wang et al., 2024).

2. Dataset Statistics and Composition

API comprises 3,740 images, extracted from 562 high-quality anime episodes or films spanning a wide range of studios, directors, genres, and contemporary CGI integrations. For each video, approximately 10 highly-informative images are retained post-filtration. The image set reflects diversity across:

  • Original aspect ratios and resolutions (rescaled to a standard 1280×720)
  • Scene content: includes characters, backgrounds, advanced CGI effects (lighting, explosion, machinery), as well as modality extremes (dark/night settings, flat-shaded compositions, high-texture action scenes)
  • Production sources: Various stylizations and technical approaches endemic to anime are present, maximizing both visual and stylistic heterogeneity

API is distributed as a unified training corpus (no fixed train/val/test splits), with public evaluation conducted via external real-world anime benchmarks such as AVC-RealLQ. The collection protocol and codebase are openly available for academic research under an MIT-style license (Wang et al., 2024).

3. Degradation Model and Pseudo-Ground Truth Generation

Standard real-world anime suffers characteristic degradations: faint/distorted hand-drawn lines and unwanted color artifacts, amplified through lossy upload/download cycles. API addresses these by integrating a two-pronged approach:

  • Prediction-Oriented Compression Module: A novel degradation pipeline simulates real-world video compression artifacts. Key elements include: application of blur (randomized Gaussian kernel), additive noise (Gaussian, Poisson), sequential lossy compression (JPEG/WebP, MPEG2/4, H.264/H.265, AVIF), and multiple randomized resizings. By randomizing the sequence and parameters, the module generates robust, transmission- and codec-realistic LR images from clean HR frames.
  • Pseudo-Ground Truth Hand-Drawn Line Enhancement: The pseudo-GT generation pipeline accentuates weakened lines via repeated unsharp masking and edge extraction (XDOG operator), combined with outlier filtering and passive dilation to preserve relevant strokes. The binary mask of enhanced lines is merged with the original ground truth to create IpseudoI_{pseudo}, which serves as the target for SR model training.

Mathematically,

Isharp=f(n)(IGT),M=Dilatepassive(Filteroutlier(XDOG(Isharp))),Ipseudo=IsharpM+IGT(1M)I_{sharp} = f^{(n)}(I_{GT}), \quad M = \mathrm{Dilate}_{passive}\left( \mathrm{Filter}_{outlier}\left( \mathrm{XDOG}(I_{sharp}) \right) \right), \quad I_{pseudo} = I_{sharp} \, M + I_{GT} (1 - M)

where f(n)f^{(n)} is the nn-fold unsharp mask application.

This framework produces supervised training data explicitly optimized to restore and clarify artist-intended linework, as opposed to generic edge enhancement (Wang et al., 2024).

4. Supervision and Balanced Perceptual Loss

API-based SR model training employs a composite loss function designed to offset the domain mismatch between photorealistic feature extractors and the stylistic attributes of anime:

  • Pixel Loss: L1L_1 distance between pseudo-GT and generated SR image.
  • Twin Perceptual Losses:
    • VGG-19 (ImageNet pre-trained) extracts photorealistic high-level features (LVGGL_{VGG}).
    • Danbooru-trained ResNet-50 captures anime-specific semantics (face structure, line style, costume, etc.) yielding LResL_{Res}, with layer-wise weighting to equalize influence.
  • Adversarial Loss: Three-scale PatchGAN model ensures sharp and artistically plausible outputs.

The total loss is

L=αLL1+β(LRes+λLVGG)+γLadvL = \alpha L_{L1} + \beta (L_{Res} + \lambda L_{VGG}) + \gamma L_{adv}

with empirically tuned weights (α=1\alpha=1, β=0.5\beta=0.5, λ=8\lambda=8, γ=0.2\gamma=0.2). Layer importance is controlled via w(R)w^{(R)} and w(V)w^{(V)} for ResNet and VGG layers respectively.

This balanced approach addresses previous GAN-driven color speckling and reinforces both line sharpness and global color/texture fidelity, substantiated by quantitative and qualitative gains (Wang et al., 2024).

5. Training Regimen and Evaluation Protocol

Typical API-based SR training employs:

  • Generator: GRL-Tiny (1.03M params; nearest-neighbor+conv upsampling).
  • Discriminator: Three-scale PatchGAN.
  • Two-stage schedule: Initial L1L_1 pretraining (300K iterations, LR=2e-4), followed by full loss and adversarial fine-tuning (300K iterations, LR=1e-4). HR patch size: 256×256, batch size: 32.
  • Degradation configuration: randomized blur (σ[0.1,2]\sigma\in[0.1,2]), noise, JPEG/WebP/AVIF (q in [20,95]), MPEG2/4 qscale [8,31], H.264 CRF[23,38], H.265 CRF[28,42].

Evaluation: Conducted on AVC-RealLQ, using no-reference IQA metrics (NIQE, MANIQA, CLIP-IQA via pyiqa (Wang et al., 2024)).

Method Params NIQE ↓ MANIQA ↑ CLIP-IQA ↑
Real-ESRGAN* 16.7M 8.281 0.381
BSRGAN* 16.7M 8.632 0.376
RealBasicVSR* 6.3M 8.621 0.362
AnimeSR 1.50M 8.109 0.462 0.539
VQD-SR 1.47M 8.202 0.464 0.567
APISR (API) 1.03M 6.719 0.514 0.711

*Fine-tuned on animation video frames (Wang et al., 2024).

API achieves the lowest NIQE (6.719), the highest MANIQA (0.514), and the highest CLIP-IQA (0.711), outperforming baselines at substantially reduced dataset and model sizes.

6. Qualitative Outcomes and Significance

API-trained models produce outputs that display:

  • Enhanced hand-drawn stroke clarity (especially on outlines, hair, and mecha details)
  • Absence of spurious color artifacts typically associated with GAN-based upscaling
  • Improved rendering of fine shadow, texture, and CGI features
  • Robust performance in low-light and CGI-intensive sequences
  • Restoration of production intent in challenging, real-world anime artifacts (figures in (Wang et al., 2024))

The dataset and its training/pipeline code are publicly available at https://github.com/Kiteretsu77/APISR (MIT-style academic license), enabling replication and extension.

7. Role Within the Anime Research Dataset Ecosystem

API is specifically tailored for anime super-resolution, contrasting with broader datasets such as AnimeShooter (multi-shot, reference-guided video generation (Qiu et al., 3 Jun 2025)) and large-scale pose-annotated resources like the “Anime Character Sheet” dataset for pose-conditioned rendering (Lin et al., 2022). API addresses challenges unique to single-frame, artifact-laden, and repetitive production frames, unifying domain-adapted collection, degradation modeling, and supervision innovations.

A plausible implication is that API’s targeted design—focused on keyframe informativeness and authentic degradation simulation—offers a more suitable foundation for SR research than general-purpose anime datasets or those constructed for rendering, pose transfer, or generative video modeling. This specialization reduces dataset size and complexity while still delivering state-of-the-art quantitative and qualitative performance for its intended application (Wang et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Anime Production-oriented Image (API) Dataset.