Italian Figurative Archive
- Italian Figurative Archive is a comprehensive, multimodal collection that curates annotated datasets of Italian metaphors, pejorative tweets, and high-resolution art images.
- It employs rigorous norming, dual-level annotation, and advanced digital tools including R/Shiny applications and DIP restoration to ensure reproducible research.
- The archive supports interdisciplinary studies across psycholinguistics, NLP, and digital humanities, offering actionable insights for metaphor processing, misogyny detection, and art restoration.
The Italian Figurative Archive encompasses a suite of curated, multimodal resources for the systematic study and digital analysis of figurative language, metaphor, and visual art in the Italian context. It consists of large-scale, openly accessible datasets and application platforms designed to facilitate empirical, computational, and digital humanities research in domains ranging from psycholinguistics and NLP to art history and cultural heritage restoration. The conception, implementation, and technical validation of these archives directly respond to the increasing demand for reproducible stimulus materials, annotated corpora, and advanced reconstruction tools relevant to figurative phenomena in both language and material culture (Bressler et al., 1 Mar 2025, Muti et al., 3 Apr 2024, Merizzi et al., 2023).
1. Scope and Components
The Italian Figurative Archive is a distributed set of resources for investigating figurative phenomena through both textual and visual modalities. Key components include:
- Figurative Language Archive: The open database of 997 Italian metaphors, normed for psycholinguistic and corpus-derived properties, includes both everyday and literary metaphors, thoroughly annotated and accessible via a web-based application (Bressler et al., 1 Mar 2025).
- PejorativITy Corpus: A manually annotated corpus of 1,200 Italian tweets targeting the disambiguation of pejorative epithets and misogyny, with granular labels at both word and sentence levels (Muti et al., 3 Apr 2024).
- Digital Art/Fresco Archive: A digital archive comprising high-resolution imagery of medieval Italian frescoes and their algorithmic reconstructions, with raw and enhanced multimodal data guided by deep image prior (DIP) restoration techniques (Merizzi et al., 2023).
These intersecting resources support research in metaphor processing, computational semantics, language inclusiveness, and digital restoration of cultural heritage.
2. Corpus and Dataset Construction
The archive's composition reflects rigorous corpus construction methodologies to ensure experimental validity and broad coverage:
Textual Metaphor Archive
Over eleven years and eleven empirical investigations, the metaphor archive collected 997 unique metaphors, partitioned into two modules: 464 "Everyday" metaphors (including nominal predicatives, predicate metaphors, noun-noun pairs) and 533 "Literary" metaphors, primarily in genitive forms harvested from poetry and prose. Approximately 87% of everyday items are paired with literal controls. Semantic content spans from abstract constructs to body-related imagery (Bressler et al., 1 Mar 2025).
Pejorative Epithets Corpus
The PejorativITy corpus is constructed by hardware and manual procedures:
- Lexicon selection combines crowd-sourced slang, shared-task keywords, and profanity lists, filtered for Italian polysemous words with both neutral and pejorative usages.
- For each of 24 lexicon items, 50 tweets from Dec 2022–Feb 2023 (via Twarc and Twitter API v2) are sampled, striving for balance across pejorative and neutral usages.
- Dual-level annotation schema labels each lexicon token in context for pejorative/neutral usage, and the tweet as a whole for misogyny (Muti et al., 3 Apr 2024).
Digital Visual Archive
Fresco restoration employs:
- A dataset of 28 medieval fresco fragments (ca. 2048×2048 px) from Northern Italy, with modalities comprising both visible and infrared digital captures.
- Manual annotation of binary masks for deteriorated regions; radiometric calibration and histogram matching to enable multimodal integration (Merizzi et al., 2023).
3. Annotation, Norming, and Measurement
Annotation frameworks are distinguished by explicit guidelines and extensive norming for objective measurement:
Normed Metaphor Ratings
Each metaphor in the Figurative Archive is rated along up to sixteen experimental and corpus-derived dimensions. Ratings are rescaled to a 1–7 Likert scale:
Key annotated dimensions include: Familiarity, Meaningfulness, Difficulty, Imageability, Aptness, Metaphoricity, Physicality/Mentality, Cloze Probability and Entropy, Number/Strength of Interpretations, Body Relatedness, and Inclusiveness (Bressler et al., 1 Mar 2025).
Inclusiveness Index
Everyday metaphors are rated for inclusiveness (stereotype-respectfulness) on a 9-point scale:
This dimension supports the selection of non-stereotyped, respectful research stimuli.
Pejorative Epithets and Misogyny Annotation
Three-phase annotation (pilot descriptive, pilot prescriptive, final expert labeling) yields Krippendorff's of 0.33–0.50 initially, with improved guideline cohesion in later rounds. Criteria include assignment of pejorative labels for offensive usage toward women (and, for WSD, toward men), neutrality for non-gendered or object referents, and explicit handling of reported speech and objectifying compliments (Muti et al., 3 Apr 2024).
Art Restoration Annotations
Visual archives store detailed metadata: binary damage masks, optimization logs, confidence maps from DIP residuals, and cross-modal alignment information (Merizzi et al., 2023).
4. Computational Methods and Technical Infrastructure
Metaphor Archive Application
- Hosted in R/Shiny, the web interface enables module selection (everyday vs literary), keyword and measure-based search, interactive plotting (1D/2D), data export (.csv/.xlsx), and access to bibliographic resources (Bressler et al., 1 Mar 2025).
- Programmatic access is supported through an open Zenodo repository and RStudio integration.
Pejorative Disambiguation in NLP
- Experiments include two strategies for improving misogyny detection: (i) concatenation of pejorative information to input representations, and (ii) direct substitution of ambiguous lexis with univocal forms in model pipelines.
- Both strategies yield improved classification accuracy over standard baselines on the PejorativITy corpus and existing Italian Twitter benchmarks, highlighting the utility of lexically-driven WSD as a preprocessing step (Muti et al., 3 Apr 2024).
Digital Restoration and DIP Inpainting
- The DIP method replaces explicit regularization with the implicit bias of an untrained convnet generator, solving
with variants integrating infrared data in a dual-modality loss (weighted by ).
- The hourglass U-Net-inspired architecture, trained via Adam with cosine-annealing, enables restoration without large training sets. Early stopping prevents overfitting to mask boundaries or high-frequency noise.
- When fusing visible and infrared input, DIP+IR inpainting achieves mean PSNR of 26.1 dB and SSIM of 0.88 in reconstructed regions—quantitatively surpassing TV-inpainting and patch-based techniques (Merizzi et al., 2023).
5. Validation and Resource Statistics
Validation of Metaphor Norms
- Zero-order Pearson correlations () are computed across all rating pairs, with FDR-corrected -values. Key patterns include:
- Familiarity–Aptness:
- Familiarity–Meaningfulness:
- Difficulty–Imageability:
- Semantic Distance–Metaphoricity:
- These replicate classical findings and confirm the resource's experimental integrity (Bressler et al., 1 Mar 2025).
Corpus Distribution and Exemplars
- PejorativITy lexicon is partitioned into gendered animal-insult metaphors (6), slurs/marked labels (4), and metaphoric epithets (14).
- Representative tweet-level annotations demonstrate fine-grained discrimination between pejorative/non-pejorative in context, supporting both human and machine analysis (Muti et al., 3 Apr 2024).
Art Archive Metrics
- On held-out test regions, DIP-based restoration outperforms variational and exemplar methods in both quantitative metrics and qualitative continuity, as judged by art historians and digital imaging specialists (Merizzi et al., 2023).
6. Research Applications and Use Cases
The unified Italian Figurative Archive enables advanced research across modalities:
- Psycholinguistic Experimentation: Selection of stimulus material for ERP/MEG studies on metaphor comprehension, with control for familiarity, concreteness, and inclusiveness (Bressler et al., 1 Mar 2025).
- NLP and Model Benchmarking: Evaluation of Italian LLMs for graded metaphor identification and contextual pejorative detection, utilizing annotated human ratings and error-resilient test sets (Bressler et al., 1 Mar 2025, Muti et al., 3 Apr 2024).
- Clinical Language Research: Comparative analyses of metaphor comprehension in atypical populations (schizophrenia, autism), leveraging the normed stimuli (Bressler et al., 1 Mar 2025).
- Digital Heritage and Art Restoration: Technique comparison, pigment analysis, and iconographic studies, underpinned by reconstructed visual data with multimodal annotation (Merizzi et al., 2023).
- Cross-Linguistic and Sociolinguistic Studies: Systematic comparison enabled by English translations and inclusiveness/pejorativity indices.
A plausible implication is that the Italian Figurative Archive’s explicit multimodal design will facilitate novel cross-domain analyses at the intersection of computational linguistics, cognitive neuroscience, and digital humanities.
7. Access and Technical Standards
All datasets and application code are openly available:
- Figurative Archive Web Platform: https://neplab.shinyapps.io/FigurativeArchive/ (with CC-BY 4.0 licensing and direct dataset download) (Bressler et al., 1 Mar 2025).
- Data Sharing and Reproducibility: Zenodo hosts the full exportable database and R/Shiny application code (DOI:10.5281/zenodo.14924803).
- Restoration Corpus: Art imagery and DIP reconstructions, with associated metadata, are maintained for further archiving and computational studies (Merizzi et al., 2023).
Documentation for each resource provides study metadata, scale definitions, and full bibliographies, supporting transparent experimental design and downstream corpus integration.
References
- (Bressler et al., 1 Mar 2025) "Figurative Archive: an open dataset and web-based application for the study of metaphor"
- (Muti et al., 3 Apr 2024) "PejorativITy: Disambiguating Pejorative Epithets to Improve Misogyny Detection in Italian Tweets"
- (Merizzi et al., 2023) "Deep image prior inpainting of ancient frescoes in the Mediterranean Alpine arc"