ModePoem Dataset for Computational Poetry Research

Updated 3 November 2025

ModePoem Dataset is a multilingual resource comprising over 100,000 annotated poems with detailed meter, language, and metadata information.
It supports diverse research applications including automatic versification, cross-modal poem generation, and benchmarking LLM-generated poetry detection.
The dataset is rigorously cleaned and annotated using both automated and human validation techniques, ensuring high-quality data for computational poetics.

ModePoem Dataset refers to several large-scale resources released in recent years for computational analysis and synthesis of poetry, with an emphasis on poetic meter, cross-modal inspiration, and generative benchmarking. Collections under the "ModePoem" terminology support diverse applications including automatic versification, image-inspired poem generation, and LLM-originated poetry detection.

1. Dataset Definition and Structure

ModePoem datasets are designed to facilitate empirical research in poetry analysis and generation. The primary release described in (Yousef et al., 2019) comprises over 100,000 annotated poems, each supplied with:

Raw poem text
Meter annotation (central feature)
Language indicator
Metadata (such as author, title when available)

A formal abstraction for meter classification appears as:

$M: P \rightarrow \mathcal{C}$

with $P$ the set of poems and $\mathcal{C}$ the set of language-specific meter classes.

Dataset statistics aggregate over multiple languages:

$N = \sum_{l \in L} n_l$

where $N$ is total poems, $L$ is the language set, and $n_l$ is the count for language $l$ .

ModePoem spans multiple languages—English, Arabic, Spanish, Russian, and Hindi—enabling comparative studies in versification. The English and Arabic subcorpora feature meticulous meter annotations: English with iambic, trochaic, etc.; Arabic with the classical bahr system.

There is an expanded interpretation of "ModePoem" in (Liu et al., 2018), where it sometimes refers to "MultiM-Poem," a multimodal dataset pairing images with English poems. The pairing process involves:

MultiM-Poem: 8,292 human-curated image-poem pairs, each line-linked by experienced literature majors for cross-modal relevance
MultiM-Poem (Ex): 26,161 pairs generated by a deep coupled visual-poetic embedding (topological similarity in embedding space)
UniM-Poem: 93,265 standalone English poems filtered for length and language consistency

All variant datasets emphasize heterogeneous sources and text normalization.

3. Collection, Cleaning, and Annotation Methodology

ModePoem datasets aggregate poems from public literary archives, digitized anthologies, and crawling reputable literary sites. The cleaning pipeline includes:

Automated removal of duplicates and prose using heuristics and length-based filtering
Exclusion of non-poetic forms not amenable to metrical annotation (i.e., haikus, limericks)
Language normalization for character encoding and structure
Human annotators verify a sample for annotation quality, especially in meter assignment

Meter annotations use both automatic metrical analysis and manual review, which ensures statistical robustness and annotation fidelity across vastly heterogeneous poetic forms.

For multimodal forms (Liu et al., 2018): human voting is required for image-poem relevance, preferring free verse and a clear inspiration link (objects, emotions, scenes).

4. Benchmarking and Technical Use Cases

ModePoem acts as a foundational benchmark for several computational poetry tasks:

Meter Classification: Models, notably RNNs operating on raw character sequences, achieve 96.38% accuracy for 16 Arabic meters and 82.31% for 4 English meters (Yousef et al., 2019).
Cross-modal Generation: Datasets enable adversarial training architectures to ensure poeticness and relevance in image-inspired poem generation; paired data supports discriminators for style/relevance and fine-tuning CNNs for poetic object detection (Liu et al., 2018).
LLM-Generated Poetry Detection: In the context of modern Chinese poetry, ModePoem (aka AIGenPoetry) contains both professional human-written and LLM-generated texts (42,400 poems total: 800 human, 41,600 LLMs) (Wang et al., 1 Sep 2025). Benchmark tasks are binary classification $Y(P): P \mapsto \{0,1\}$ (human/AI source), revealing the acute challenge in detecting intrinsic stylistic emulation by advanced LLMs.

5. Public Availability and Documentation

All ModePoem corpora are publicly released, hosted at designated repositories under open academic licenses:

[Meter annotation datasets: repository link as indicated in (Yousef et al., 2019)]
[Image-poem and English corpus: https://github.com/bei21/img2poem]
[LLM poetry detection: https://github.com/NLP2CT/AIGenPoetry-Detection]

Supporting documentation comprises dataset schema, annotation and usage guidelines, sample code for parsing and classification, as well as meter classification theory for each language. Distributions and frequency analyses are included for comprehensive dataset understanding.

6. Impact and Novel Research Enablement

ModePoem marks several research firsts:

Unified Multilingual Meter Data: First consistently annotated large-scale meter corpus across five languages, bridging significant resource gaps in computational verse analysis (Yousef et al., 2019).
Human-Annotated Multimodal Inspiration: First large-scale image-poem pair resource for benchmarking cross-modal poetic generation and evaluation of deep coupled visual-poetic embedding models (Liu et al., 2018).
LLM Benchmarking: The benchmark for distinguishing LLM-generated modern Chinese poetry sets a new challenge for stylometric detectors, revealing the unreliability of existing algorithms on poems generated to imitate human style. Style-based imitation is the hardest to detect, even for fine-tuned RoBERTa, which otherwise achieves up to 91.17% F1 (baseline) but drops sharply in out-of-domain detection and style-matched cases (Wang et al., 1 Sep 2025).

ModePoem resources enable:

Statistical and neural modeling of meter and style
Large-scale benchmarking for generative poetry models and multimodal inspiration tasks
Linguistic and literary study of versification systems, historical trends, and cross-lingual stylistic differences
Data-driven research into distinguishing human vs. AI poetic creativity

7. Summary Table of ModePoem Variants

Dataset Variant	Num. Poems	Unique Features
ModePoem (meter corpus)	>100,000	Multilingual, annotated meters, cleaned format
MultiM-Poem	8,292	Human-judged image-poem pairs
MultiM-Poem (Ex)	26,161	Embedding-based image-poem semantic pairs
UniM-Poem	93,265	Standalone English poems, poeticness-filtered
AIGenPoetry/ModePoem	42,400	Human+LLM Chinese poetry for origin detection

ModePoem datasets underpin advances in computational poetics, meter classification, transfer learning across verse genres, and robust evaluation of AI-generated poetry and stylistic mimetics. The cross-lingual and multimodal structure, open availability, and rigorous documentation position ModePoem as a central resource for empirical and applied research in the digital humanities and natural language processing.