VerseCrafter: Interactive Poetry System

Updated 13 January 2026

VerseCrafter is a computational framework for interactive poetry and lyric generation that integrates explicit rhyme control and user guidance.
It combines transformer-based encoder–decoder models and dual-encoder retrieval systems to structure verses with targeted style and fluency.
The system utilizes detailed metadata and phonetic constraints to ensure high rhyme precision and real-time creative assistance.

VerseCrafter systems constitute a class of computational frameworks for interactive poetry and lyric generation, with an emphasis on controllable rhyme structure, poet style, and user-guided composition. These systems serve both as autonomous generators and as real-time creative assistants, operationalizing recent advances in neural encoder–decoder models and retrieval-based suggestion engines to augment traditional poetic workflows. Two prominent exemplars are the Last Word First (LWF) encoder–decoder approach for free-verse generation with explicit rhyme control (Pasini et al., 2024), and the "Verse by Verse" dual-encoder retrieval platform designed for interactive, style-specific line suggestion (Uthus et al., 2021).

1. System Architectures and Input Representations

VerseCrafter platforms employ distinct yet complementary architectural paradigms for verse generation:

Encoder–Decoder Generation: Pasini et al. (Pasini et al., 2024) utilize a standard Transformer-style encoder–decoder (T5, mT5). Inputs to the encoder are flat sequences of special‐token–delimited metadata fields, e.g., <title>, <artist>, <genre>, <emotion>, <topics>, and a <rhyming_schema>. All tokens, including markers, are embedded through the pretrained layer of the PLM. At the decoder side, the LWF mechanism prepends each target line's rhyming word (the intended line-ending) followed by a <sep> token, ensuring that the critical rhyming choice is committed before semantic content is generated. Decoding remains strictly left-to-right, preserving compatibility with pretrained parameterization.
Retrieval-Based Suggestion: "Verse by Verse" (Uthus et al., 2021) separates generation and suggestion into offline and online modules. Offline, per-poet decoder-only Transformers are fine-tuned to enumerate millions of plausible verse lines, each tagged with metadata (syllable count, rhyme phoneme, POS fingerprint, etc.) and encoded into fixed-size vectors via a Transformer-based dual-encoder. In real time, the latest user-provided line is encoded ("parent"), and a nearest-neighbor search returns candidate lines ("children") optimized for contextual fit and user-specified constraints. Embedding indices employ hierarchical quantization for sub-millisecond retrieval latency.

2. Training Objectives and Optimization

The principal training objectives are determined by the generative or retrieval nature of the framework:

LWF Encoder–Decoder Objective: For sequence generation, the objective is standard token-level cross-entropy loss, with logits computed as

$\mathbf{z}^{(i)} = M\bigl(X,\;t_{<i}\bigr) \quad\in\mathbb{R}^{|V|}$

and loss

$\mathcal{L}_{\rm XE} = -\frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{|V|}\mathbb{1}[j=t_i]\;\log\;p_j^{(i)}$

where $p_j^{(i)} = \mathrm{softmax}(\mathbf{z}^{(i)})_j$ . In the LWF+EPR variant, an auxiliary head is trained to predict the Ending Phonetic Representation, aggregating $\mathcal{L}_{\rm XE}^{\rm verse} + \mathcal{L}_{\rm XE}^{\rm phoneme}$ with EPRs extracted from the CMU Pronouncing Dictionary (Pasini et al., 2024).

Retrieval System Training: The dual-encoder model is trained via contrastive loss, maximizing dot-product similarity for true (parent, child) pairs and penalizing incorrect pairings:

$L_i = -\log \frac{\exp(u_i^\top v_i)}{\sum_{j=1}^B \exp(u_i^\top v_j)}$

with weights shared across the encoders except for final FC layers. Pretraining occurs on noisy user-forum data, followed by fine-tuning on poetic line-pair corpora (Uthus et al., 2021).

Line-Generation Pruning: The generative Transformer's output space is pruned via normalized next-token probabilities:

$\text{normalized\_score}(w|s) = \frac{p(w|s)}{\max_{z\in V} p(z|s)}$

Retention proceeds with $\text{normalized\_score} \geq 0.925$ , and search-state growth is limited by the sum-log-probability among top $10^8$ partials per iteration.

3. Datasets, Corpora, and Multilingual Considerations

VerseCrafter systems are reliant on large, meticulously annotated verse corpora:

Dataset	Language Scope	Size (train/dev/test)	Annotation Details
Genius.com	English	570K / 3.5K / 3.5K	Title, artist, genre, emotion, topics, phonetic rhyme schemas
Wasabi	13 European languages	2.6M / 10K / 10K	Sentence blocks, phonetic rhyme schemas, no explicit section/genre
Project Gutenberg & curated poets	English	~1.1M lines, 22 poets	Per-poet fine-tuning, line-level metadata (syllable count, rhyme)

In the Wasabi dataset, rhyme schemas are induced by applying the Ghazvininejad algorithm with language-specific vowel sets. The challenge of porting rhyme detection across prosodically diverse languages is significant, as each possesses unique poetic conventions, e.g., mora-timing in Finnish (Pasini et al., 2024).

4. Algorithmic Control of Rhyme and Structure

VerseCrafter systems operationalize rhyme-controllability through distinct mechanisms:

Explicit Rhyme Seeding (LWF): Rhyming targets (schema labels) and, optionally, explicit rhyme words are provided as input. At decoding time, upon emission of a sentence boundary, the next rhyme label and word are forced, enabling structured rhyme patterns to be imposed with minimal risk to global fluency. Sampling + rerank strategies (top-p ≈ 0.9, k ≈ 20) balance rhyme fidelity with generation diversity.
Phonetic Constrained Retrieval: In "Verse by Verse," the system filters retrieval candidates by rhyme requirements derived from phonetic extraction (Kestrel normalization). Both perfect (identical vowel + coda) and imperfect rhymes (log-odds consonant similarity ≥ 0) are supported, and constraints degrade gracefully to non-rhyming lines if no candidates match (Uthus et al., 2021).
Human-in-the-Loop Interaction: Users may iteratively select, edit, or reject system suggestions, and can supply partial stanzas, target rhyme words, or stylistic metadata, promoting a co-creative dynamic.

5. Quantitative Evaluation and Best Practices

VerseCrafter performance is assessed via both automated metrics and human evaluation:

Metric	Definition/Equation	Application
Perplexity ( $PPL$ )	$\exp\bigl[-\frac{1}{N}\sum_{i=1}^N \log p(y_i\|y_{<i},x)\bigr]$	Coherence, fluency
Mauve	Distributional divergence between human and system outputs	Fluency, diversity
Rhyming Precision (RP)	$\frac{1}{\|R\|}\sum_{(t_i,t_j)\in R} \mathbb{1}[rhyme(t_i,t_j)]$	Rhyme control accuracy
False-Positive Rate (FPR)	$\frac{1}{\|NR\|}\sum_{(t_i,t_j)\in NR}[1-\mathbb{1}[rhyme(t_i,t_j)]]$	False rhyme detection
distinct-n	(# unique n-grams)/(# total n-grams) for $n=2,3,4$	Lexical diversity
Copyright Risk	LCS $>$ 20 tokens match to training set	Plagiarism tendency

Key findings (Pasini et al., 2024):

LWF-Pretrained attains RP ≈ 89.8% (English), FPR ≈ 11.7%, Mauve ≈ 0.0240 (sampling + rerank), versus vanilla left-to-right finetuning at RP ≈ 35%.
Human annotation (Correctness, Meaningfulness, "Is-Human" judgment) shows LWF-Pretrained approaches human reference quality.
Multilingual LWF (mT5, 13 languages) achieves ≈41% rhyme precision, underscoring limitations in cross-lingual rhyming control.

Best practices include:

Sampling + rerank for increased fluency/rhyme fidelity
Clear, distinct phonetic rhyme seeds
Avoidance of ultra-low-frequency seeds to mitigate copying risk

6. Interaction Modalities and Creative Applications

VerseCrafter architectures support both fully automatic and highly interactive modalities. LWF-based systems permit explicit rhyme and style steering by injection of metadata and seed words for each rhyme label. Users may request completion of partial stanzas, generation from scratch under fixed schema, or targeted regeneration of specific lines with replaced rhyme seeds. In retrieval-based systems, every new line—typed or accepted—is fed back as the new parent for subsequent search, enabling responsive, iterative composition cycles with real-time (<1s) suggestion latency. This interactive infrastructure is intended to augment, rather than replace, human creative agency (Uthus et al., 2021).

7. Limitations and Research Directions

Identified limitations include:

Bias/Toxicity: Without systematic debiasing, genre-tuned outputs (e.g., rap) can inherit offensive language from training data.
Language Adaption: Phonetic rhyme detection, originally for English, generalizes imperfectly to other languages' prosody and poetic traditions. Multilingual performance (41% rhyme precision) lags monolingual English (≈90%). Exploring phonetic interlingua or language-specific adapters is suggested.
Meter and Rhythm Control: Current systems control only end-rhyme, not metrics like stress patterns or syllable counts.
Scaling and Few-Shot Capabilities: Open questions remain regarding direct prompting strategies, retrieval-augmented generation, or larger PLMs to further reduce data requirements and enable better stylistic adaptation.
Plagiarism Risk: Although flagged via LCS comparison, avoidance of long subsequence copying is not explicitly optimized.

A plausible implication is that future systems may focus on joint control of rhyme, meter, and stylistic emulation, potentially integrating hybrid retrieval–generation architectures and universal phonetic representations (Pasini et al., 2024).

Markdown Report Issue Upgrade to Chat

References (2)

Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming (2024)

Augmenting Poetry Composition with Verse by Verse (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to VerseCrafter.

VerseCrafter: Interactive Poetry System

1. System Architectures and Input Representations

2. Training Objectives and Optimization

3. Datasets, Corpora, and Multilingual Considerations

4. Algorithmic Control of Rhyme and Structure

5. Quantitative Evaluation and Best Practices

6. Interaction Modalities and Creative Applications

7. Limitations and Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

VerseCrafter: Interactive Poetry System

1. System Architectures and Input Representations

2. Training Objectives and Optimization

3. Datasets, Corpora, and Multilingual Considerations

4. Algorithmic Control of Rhyme and Structure

5. Quantitative Evaluation and Best Practices

6. Interaction Modalities and Creative Applications

7. Limitations and Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research