Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 89 tok/s

Gemini 2.5 Pro 58 tok/s Pro

GPT-5 Medium 39 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 119 tok/s Pro

Kimi K2 188 tok/s Pro

GPT OSS 120B 460 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

ViStoryBench: Comprehensive Benchmark Suite for Story Visualization (2505.24862v1)

Published 30 May 2025 in cs.CV

Abstract: Story visualization, which aims to generate a sequence of visually coherent images aligning with a given narrative and reference images, has seen significant progress with recent advancements in generative models. To further enhance the performance of story visualization frameworks in real-world scenarios, we introduce a comprehensive evaluation benchmark, ViStoryBench. We collect a diverse dataset encompassing various story types and artistic styles, ensuring models are evaluated across multiple dimensions such as different plots (e.g., comedy, horror) and visual aesthetics (e.g., anime, 3D renderings). ViStoryBench is carefully curated to balance narrative structures and visual elements, featuring stories with single and multiple protagonists to test models' ability to maintain character consistency. Additionally, it includes complex plots and intricate world-building to challenge models in generating accurate visuals. To ensure comprehensive comparisons, our benchmark incorporates a wide range of evaluation metrics assessing critical aspects. This structured and multifaceted framework enables researchers to thoroughly identify both the strengths and weaknesses of different models, fostering targeted improvements.

Summary

The paper introduces ViStoryBench, a comprehensive benchmark suite with a multi-dimensional dataset and specific metrics for evaluating story visualization models.
ViStoryBench features a diverse dataset covering various story genres and artistic styles, including single and multiple protagonists, to test consistency and complexity.
The benchmark includes 12 automated metrics, evaluates over twenty methods, and is publicly released to foster innovation and improve real-world applications.

ViStoryBench: Comprehensive Benchmark Suite for Story Visualization

In the paper, the authors introduce ViStoryBench, a comprehensive benchmark suite designed to enhance story visualization performance via evaluation across diverse dimensions. Story visualization refers to generating sequences of images that visually represent narratives, synchronizing with given reference images to deliver immersive storytelling experiences. While generative models have advanced this domain significantly, a robust benchmark to evaluate these models comprehensively is paramount.

Multi-Dimensional Dataset Creation

The dataset curated within ViStoryBench is diverse, embracing various story genres and artistic styles. This multi-dimensional approach ensures that models are assessed from multiple perspectives, including unique plots and diverse visual aesthetics, ranging from anime to 3D renderings. The benchmark encompasses 80 story segments with 344 roles, ensuring a balanced integration of narrative structures and visual elements. Importantly, ViStoryBench includes stories containing single and multiple protagonists to test models' consistency in character portrayal, alongside complex plots and detailed world-building that challenge the precision of visual generation models.

Comprehensive Evaluation Metrics

Beyond traditional evaluation metrics like image quality and diversity, ViStoryBench incorporates measures specific to story visualization. The benchmark assesses stylistic consistency within generated sequences, alignment of character interactions with textual descriptions, and the liveliness and variety of generated characters, moving beyond mere replication of reference images. A total of 12 automated evaluation metrics evaluates these critical aspects, enabling researchers to pinpoint strengths and weaknesses among various models, ultimately driving targeted enhancements.

Extensive Model Evaluation

The authors conducted thorough evaluations on over twenty methods, including eighteen principal methods and their variations. Additionally, consistency between user studies and automated metrics was analyzed, offering insights into model functionalities. The benchmark, including prompts from the data construction pipeline, automatic and manual evaluation outcomes, and reproduction codes, is publicly released for advancing story visualization research.

Future Implications and Speculations

ViStoryBench lays the groundwork for significant advancements within AI-driven story visualization. The introduction of diverse datasets and comprehensive metrics promises not only enhancements in model evaluation but also anticipates improved real-world applications in entertainment, education, and multimedia storytelling. Given AI's rapid evolution, future models may exceed current benchmarks, adapting dynamically to encompass more nuanced narratives and sophisticated visual storytelling capabilities.

Theoretical implications suggest a shift toward deeper integration of narrative understanding within visual models, enhancing AI's comprehension and generation abilities holistically. Consequently, ViStoryBench may inspire development towards universal frameworks, harmonizing multiple narrative forms and visual styles, including emerging fields like virtual reality storytelling and interactive media experiences. This benchmark suite is an invaluable resource for the AI community to foster innovation and synergy in story visualization technologies.