Papers
Topics
Authors
Recent
Search
2000 character limit reached

VerseCrafter: Interactive Poetry System

Updated 13 January 2026
  • VerseCrafter is a computational framework for interactive poetry and lyric generation that integrates explicit rhyme control and user guidance.
  • It combines transformer-based encoder–decoder models and dual-encoder retrieval systems to structure verses with targeted style and fluency.
  • The system utilizes detailed metadata and phonetic constraints to ensure high rhyme precision and real-time creative assistance.

VerseCrafter systems constitute a class of computational frameworks for interactive poetry and lyric generation, with an emphasis on controllable rhyme structure, poet style, and user-guided composition. These systems serve both as autonomous generators and as real-time creative assistants, operationalizing recent advances in neural encoder–decoder models and retrieval-based suggestion engines to augment traditional poetic workflows. Two prominent exemplars are the Last Word First (LWF) encoder–decoder approach for free-verse generation with explicit rhyme control (Pasini et al., 2024), and the "Verse by Verse" dual-encoder retrieval platform designed for interactive, style-specific line suggestion (Uthus et al., 2021).

1. System Architectures and Input Representations

VerseCrafter platforms employ distinct yet complementary architectural paradigms for verse generation:

  • Encoder–Decoder Generation: Pasini et al. (Pasini et al., 2024) utilize a standard Transformer-style encoder–decoder (T5, mT5). Inputs to the encoder are flat sequences of special‐token–delimited metadata fields, e.g., <title>, <artist>, <genre>, <emotion>, <topics>, and a <rhyming_schema>. All tokens, including markers, are embedded through the pretrained layer of the PLM. At the decoder side, the LWF mechanism prepends each target line's rhyming word (the intended line-ending) followed by a <sep> token, ensuring that the critical rhyming choice is committed before semantic content is generated. Decoding remains strictly left-to-right, preserving compatibility with pretrained parameterization.
  • Retrieval-Based Suggestion: "Verse by Verse" (Uthus et al., 2021) separates generation and suggestion into offline and online modules. Offline, per-poet decoder-only Transformers are fine-tuned to enumerate millions of plausible verse lines, each tagged with metadata (syllable count, rhyme phoneme, POS fingerprint, etc.) and encoded into fixed-size vectors via a Transformer-based dual-encoder. In real time, the latest user-provided line is encoded ("parent"), and a nearest-neighbor search returns candidate lines ("children") optimized for contextual fit and user-specified constraints. Embedding indices employ hierarchical quantization for sub-millisecond retrieval latency.

2. Training Objectives and Optimization

The principal training objectives are determined by the generative or retrieval nature of the framework:

  • LWF Encoder–Decoder Objective: For sequence generation, the objective is standard token-level cross-entropy loss, with logits computed as

z(i)=M(X,  t<i)RV\mathbf{z}^{(i)} = M\bigl(X,\;t_{<i}\bigr) \quad\in\mathbb{R}^{|V|}

and loss

LXE=1Ni=1Nj=1V1[j=ti]  log  pj(i)\mathcal{L}_{\rm XE} = -\frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{|V|}\mathbb{1}[j=t_i]\;\log\;p_j^{(i)}

where pj(i)=softmax(z(i))jp_j^{(i)} = \mathrm{softmax}(\mathbf{z}^{(i)})_j. In the LWF+EPR variant, an auxiliary head is trained to predict the Ending Phonetic Representation, aggregating LXEverse+LXEphoneme\mathcal{L}_{\rm XE}^{\rm verse} + \mathcal{L}_{\rm XE}^{\rm phoneme} with EPRs extracted from the CMU Pronouncing Dictionary (Pasini et al., 2024).

  • Retrieval System Training: The dual-encoder model is trained via contrastive loss, maximizing dot-product similarity for true (parent, child) pairs and penalizing incorrect pairings:

Li=logexp(uivi)j=1Bexp(uivj)L_i = -\log \frac{\exp(u_i^\top v_i)}{\sum_{j=1}^B \exp(u_i^\top v_j)}

with weights shared across the encoders except for final FC layers. Pretraining occurs on noisy user-forum data, followed by fine-tuning on poetic line-pair corpora (Uthus et al., 2021).

  • Line-Generation Pruning: The generative Transformer's output space is pruned via normalized next-token probabilities:

normalized_score(ws)=p(ws)maxzVp(zs)\text{normalized\_score}(w|s) = \frac{p(w|s)}{\max_{z\in V} p(z|s)}

Retention proceeds with normalized_score0.925\text{normalized\_score} \geq 0.925, and search-state growth is limited by the sum-log-probability among top 10810^8 partials per iteration.

3. Datasets, Corpora, and Multilingual Considerations

VerseCrafter systems are reliant on large, meticulously annotated verse corpora:

Dataset Language Scope Size (train/dev/test) Annotation Details
Genius.com English 570K / 3.5K / 3.5K Title, artist, genre, emotion, topics, phonetic rhyme schemas
Wasabi 13 European languages 2.6M / 10K / 10K Sentence blocks, phonetic rhyme schemas, no explicit section/genre
Project Gutenberg & curated poets English ~1.1M lines, 22 poets Per-poet fine-tuning, line-level metadata (syllable count, rhyme)

In the Wasabi dataset, rhyme schemas are induced by applying the Ghazvininejad algorithm with language-specific vowel sets. The challenge of porting rhyme detection across prosodically diverse languages is significant, as each possesses unique poetic conventions, e.g., mora-timing in Finnish (Pasini et al., 2024).

4. Algorithmic Control of Rhyme and Structure

VerseCrafter systems operationalize rhyme-controllability through distinct mechanisms:

  • Explicit Rhyme Seeding (LWF): Rhyming targets (schema labels) and, optionally, explicit rhyme words are provided as input. At decoding time, upon emission of a sentence boundary, the next rhyme label and word are forced, enabling structured rhyme patterns to be imposed with minimal risk to global fluency. Sampling + rerank strategies (top-p ≈ 0.9, k ≈ 20) balance rhyme fidelity with generation diversity.
  • Phonetic Constrained Retrieval: In "Verse by Verse," the system filters retrieval candidates by rhyme requirements derived from phonetic extraction (Kestrel normalization). Both perfect (identical vowel + coda) and imperfect rhymes (log-odds consonant similarity ≥ 0) are supported, and constraints degrade gracefully to non-rhyming lines if no candidates match (Uthus et al., 2021).
  • Human-in-the-Loop Interaction: Users may iteratively select, edit, or reject system suggestions, and can supply partial stanzas, target rhyme words, or stylistic metadata, promoting a co-creative dynamic.

5. Quantitative Evaluation and Best Practices

VerseCrafter performance is assessed via both automated metrics and human evaluation:

Metric Definition/Equation Application
Perplexity (PPLPPL) exp[1Ni=1Nlogp(yiy<i,x)]\exp\bigl[-\frac{1}{N}\sum_{i=1}^N \log p(y_i|y_{<i},x)\bigr] Coherence, fluency
Mauve Distributional divergence between human and system outputs Fluency, diversity
Rhyming Precision (RP) 1R(ti,tj)R1[rhyme(ti,tj)]\frac{1}{|R|}\sum_{(t_i,t_j)\in R} \mathbb{1}[rhyme(t_i,t_j)] Rhyme control accuracy
False-Positive Rate (FPR) 1NR(ti,tj)NR[11[rhyme(ti,tj)]]\frac{1}{|NR|}\sum_{(t_i,t_j)\in NR}[1-\mathbb{1}[rhyme(t_i,t_j)]] False rhyme detection
distinct-n (# unique n-grams)/(# total n-grams) for n=2,3,4n=2,3,4 Lexical diversity
Copyright Risk LCS >>20 tokens match to training set Plagiarism tendency

Key findings (Pasini et al., 2024):

  • LWF-Pretrained attains RP ≈ 89.8% (English), FPR ≈ 11.7%, Mauve ≈ 0.0240 (sampling + rerank), versus vanilla left-to-right finetuning at RP ≈ 35%.
  • Human annotation (Correctness, Meaningfulness, "Is-Human" judgment) shows LWF-Pretrained approaches human reference quality.
  • Multilingual LWF (mT5, 13 languages) achieves ≈41% rhyme precision, underscoring limitations in cross-lingual rhyming control.

Best practices include:

  • Sampling + rerank for increased fluency/rhyme fidelity
  • Clear, distinct phonetic rhyme seeds
  • Avoidance of ultra-low-frequency seeds to mitigate copying risk

6. Interaction Modalities and Creative Applications

VerseCrafter architectures support both fully automatic and highly interactive modalities. LWF-based systems permit explicit rhyme and style steering by injection of metadata and seed words for each rhyme label. Users may request completion of partial stanzas, generation from scratch under fixed schema, or targeted regeneration of specific lines with replaced rhyme seeds. In retrieval-based systems, every new line—typed or accepted—is fed back as the new parent for subsequent search, enabling responsive, iterative composition cycles with real-time (<1s) suggestion latency. This interactive infrastructure is intended to augment, rather than replace, human creative agency (Uthus et al., 2021).

7. Limitations and Research Directions

Identified limitations include:

  • Bias/Toxicity: Without systematic debiasing, genre-tuned outputs (e.g., rap) can inherit offensive language from training data.
  • Language Adaption: Phonetic rhyme detection, originally for English, generalizes imperfectly to other languages' prosody and poetic traditions. Multilingual performance (41% rhyme precision) lags monolingual English (≈90%). Exploring phonetic interlingua or language-specific adapters is suggested.
  • Meter and Rhythm Control: Current systems control only end-rhyme, not metrics like stress patterns or syllable counts.
  • Scaling and Few-Shot Capabilities: Open questions remain regarding direct prompting strategies, retrieval-augmented generation, or larger PLMs to further reduce data requirements and enable better stylistic adaptation.
  • Plagiarism Risk: Although flagged via LCS comparison, avoidance of long subsequence copying is not explicitly optimized.

A plausible implication is that future systems may focus on joint control of rhyme, meter, and stylistic emulation, potentially integrating hybrid retrieval–generation architectures and universal phonetic representations (Pasini et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to VerseCrafter.