From Plain Text to Poetic Form: Generating Metrically-Constrained Sanskrit Verses

Published 1 Jun 2025 in cs.CL | (2506.00815v1)

Abstract: Recent advances in LLMs have significantly improved natural language generation, including creative tasks like poetry composition. However, most progress remains concentrated in high-resource languages. This raises an important question: Can LLMs be adapted for structured poetic generation in a low-resource, morphologically rich language such as Sanskrit? In this work, we introduce a dataset designed for translating English prose into structured Sanskrit verse, with strict adherence to classical metrical patterns, particularly the Anushtub meter. We evaluate a range of generative models-both open-source and proprietary-under multiple settings. Specifically, we explore constrained decoding strategies and instruction-based fine-tuning tailored to metrical and semantic fidelity. Our decoding approach achieves over 99% accuracy in producing syntactically valid poetic forms, substantially outperforming general-purpose models in meter conformity. Meanwhile, instruction-tuned variants show improved alignment with source meaning and poetic style, as supported by human assessments, albeit with marginal trade-offs in metrical precision.

Abstract PDF Upgrade to Chat

Summary

Generating Metrically-Constrained Sanskrit Verses from English Prose

The paper "From Plain Text to Poetic Form: Generating Metrically-Constrained Sanskrit Verses" addresses the complex task of generating Sanskrit poetry adhering to classical metrical forms from English prose. This endeavor is set against a backdrop of recent advancements in large language models (LLMs) that have elevated the performance for high-resource languages in creative generation tasks. Sanskrit presents unique challenges due to its rich morphological structure and metrical constraints, necessitating innovative approaches that balance poetic style with semantic coherence.

Dataset and Problem Setting

The authors introduce a novel dataset derived from the Valmiki Ramayana corpus, aimed at translating English prose into structured Sanskrit verse in the Anuṣṭubh meter, one of the most common metrical forms in Sanskrit literature. The Anuṣṭubh meter requires strict syllable arrangements across verses, characterized by specific patterns of light and heavy syllables. The dataset includes 9,727 verses with English translations, presenting a complex cross-lingual translation and poetry generation task.

Methodological Approaches

The paper explores two main strategies for generating Sanskrit poetry:

Constrained Decoding: This approach is tailored to enforce metrical constraints during inference by applying precompiled regex filters that check for syllabic correctness according to the Sanskrit Anuṣṭubh meter. This method achieves remarkable metrical precision, with syntactically valid poetic forms being generated with over 99% accuracy.
Instruction Fine-tuning: Emphasizing semantic fidelity and style, this approach leverages instruction-tuned models capable of using personalized prompts to internalize both syntactic and poetic patterns. This method improves the alignment with the source meaning while facilitating a stylistic adherence.

Evaluation

The authors employ both syntactic and semantic evaluation metrics to gauge the performance of their models. The syntactic evaluation focuses on metrical adherence, while the semantic evaluation assesses the preservation of meaning through human annotations. The constrained decoding approach, particularly with the NLLB-dist-1.3B model, achieves exceptional accuracy in conforming to the Anuṣṭubh meter, albeit with some trade-offs in semantic similarity.

Notably, instruction-fine-tuned models like Phi-4-14B and Mistral-Nemo-2407-12B balance syntax and semantics, achieving syntactic accuracies above 50% and semantic scores close to 68%.

Implications and Future Work

The research presents significant implications for structured generation in low-resource languages, showcasing the feasibility of generating metrically-constrained poetry from English inputs using LLMs. The success of constrained decoding underscores the advantages of infusing metrical constraints at the generation stage, providing insights into further exploration of meter-specific controls in poetry generation. Furthermore, the study sets the stage for expanding this methodology to incorporate diverse metrical forms beyond Anuṣṭubh, potentially paving the way for a more generalized framework applicable to other morphologically rich languages.

Future developments might include enhancing model architecture to inherently support diverse metrical styles without needing post-generation correction. Exploring the integration of symbolic AI components within linguistic frameworks could refine generation precision and elevate the quality of cross-lingual poetic synthesis.

In conclusion, this paper contributes a robust foundation for structured poetic generation in Sanskrit, emphasizing the intricate interplay between semantics and metrical fidelity. The dual approaches adopted exemplify the evolving landscape of AI-driven linguistic innovation, setting the course for advancements in creative text generation across culturally significant languages.