Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation (1811.10666v3)

Published 26 Nov 2018 in cs.CV

Abstract: The applicability of computer vision to real paintings and artworks has been rarely investigated, even though a vast heritage would greatly benefit from techniques which can understand and process data from the artistic domain. This is partially due to the small amount of annotated artistic data, which is not even comparable to that of natural images captured by cameras. In this paper, we propose a semantic-aware architecture which can translate artworks to photo-realistic visualizations, thus reducing the gap between visual features of artistic and realistic data. Our architecture can generate natural images by retrieving and learning details from real photos through a similarity matching strategy which leverages a weakly-supervised semantic understanding of the scene. Experimental results show that the proposed technique leads to increased realism and to a reduction in domain shift, which improves the performance of pre-trained architectures for classification, detection, and segmentation. Code is publicly available at: https://github.com/aimagelab/art2real.

Semantic Image-to-Image Translation for Artistic and Realistic Domains

In "Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation," Matteo Tomei and colleagues explore an advanced computer vision approach to bridging the visual gap between artistic images and photo-realistic visuals. The paper presents a semantic-aware framework that translates artworks into realistic images, addressing limitations faced by current computer vision models when applied to artistic domains. This methodology allows the leveraging of a weakly-supervised semantic understanding of scenes, fostering improved feature distribution alignment across domains and enhancing the performance of classification, detection, and segmentation tasks.

Methodology

The primary innovation introduced in this research is a semantic-aware translation architecture that enables the generation of realistic images from artworks by referencing real-world image details. This is not a simple transfer of visual styles but a sophisticated image-to-image translation that preserves the semantic content of the original artwork. The method comprises several key components:

  1. Semantic Understanding and Patch-Based Approach: The translation process utilizes memory banks of image patches extracted from real images, each categorized by semantic class. These patches serve as a reference to enhance realism in the generated images.
  2. Unpaired Image-to-Image Translation: The architecture operates under an unpaired setting, employing a cycle-consistent framework that facilitates domain transformation without requiring paired datasets.
  3. Semantic Affinity Matching: By computing semantic segmentation masks on both the artworks and generated images, the method ensures semantically coherent patch substitution. This matching is achieved through an affinity matrix driven by cosine similarity measurements, which guide the generator network in selecting the most suitable real-image patches.
  4. Multi-Scale Contextual Loss: The approach includes a multi-scale variant of the contextual loss to ensure that generated images maintain high fidelity to realistic image features across various scales.

Experimental Validation

The paper provides a comprehensive evaluation across multiple datasets, including famous paintings by artists like Monet and Cezanne, alongside landscape and portrait datasets. Results are assessed both quantitatively, using metrics such as the Fréchet Inception Distance (FID), and qualitatively through human judgment studies.

  • Fréchet Inception Distance: The proposed method demonstrates superior FID values, indicating a better approximation of the statistical distribution of real images compared to existing approaches like Cycle-GAN and UNIT.
  • User Studies: Assessments reveal that images generated by this method were consistently perceived as more realistic and semantically coherent with the original artworks compared to other state-of-the-art techniques.

Implications and Future Directions

The contribution of this work lies in its ability to substantially reduce the domain gap between artistic and real-world images, thus allowing pre-trained models to perform more accurately on features extracted from translated artworks. This bridge in domain discrepancies has practical implications for the field of digital art preservation and enhancement, offering robust tools for art historians and computer vision practitioners.

Theoretically, this research opens avenues for future work in domain adaptation, potentially expanding to other unstructured data domains. As the field continues to evolve, challenges such as the need for smaller-scale annotated datasets for specialized styles or the integration with textual descriptions of artworks might be explored. Additionally, further improvement in semantic understanding could enhance the application scope, promoting a wider applicability of computer vision tools in creative domains.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Matteo Tomei (5 papers)
  2. Marcella Cornia (61 papers)
  3. Lorenzo Baraldi (68 papers)
  4. Rita Cucchiara (142 papers)
Citations (75)