The Art of Asking: Multilingual Prompt Optimization for Synthetic Data (2510.19806v1)

Published 22 Oct 2025 in cs.CL

Abstract: Synthetic data has become a cornerstone for scaling LLMs, yet its multilingual use remains bottlenecked by translation-based prompts. This strategy inherits English-centric framing and style and neglects cultural dimensions, ultimately constraining model generalization. We argue that the overlooked prompt space-the very inputs that define training distributions-offers a more powerful lever for improving multilingual performance. We introduce a lightweight framework for prompt-space optimization, where translated prompts are systematically transformed for Naturalness, Cultural Adaptation, and Difficulty Enhancement. Using an off-the-shelf multilingual LLM, we apply these transformations to prompts for 12 languages spanning 7 families. Under identical data conditions, our approaches achieve substantial and consistent downstream improvements over the translation-only baseline: +4.7% on Global-MMLU accuracy, +2.4% on Flores XCometXL and +35.3% wins in preferences on mArenaHard. We establish prompt-space optimization as a simple yet powerful paradigm for building multilingual LLMs that are more robust, culturally grounded, and globally capable.

Summary

The paper introduces a framework that optimizes multilingual prompts through modular transformations to enhance naturalness, cultural adaptation, and difficulty.
By applying naturalness, cultural, and difficulty operators, the method overcomes English-centric biases and improves fine-tuning performance across tasks.
Experimental results demonstrate that the approach significantly enhances data quality and downstream performance, especially in low-resource languages.

Multilingual Prompt Optimization for Synthetic Data

This essay examines the paper titled "The Art of Asking: Multilingual Prompt Optimization for Synthetic Data," which presents a novel framework for optimizing prompts in multilingual LLMs to improve synthetic data generation. The methodology focuses on leveraging prompt transformations to enhance naturalness, cultural adaptation, and difficulty, fundamentally shifting the synthetic data paradigm from generation-centric to prompt-centric approaches.

Introduction

The paper addresses the limitations of traditional synthetic data generation methods, which heavily rely on English-to-target language translations, leading to English-centric biases. Such biases introduce translation artifacts and fail to adapt content culturally, thus limiting the model's generalization and reaching its full potential in multilingual contexts.

The authors propose a lightweight framework designed to perform systematic transformations on prompts across three key dimensions: Naturalness, Cultural Adaptation, and Difficulty Enhancement. These transformations aim to produce data that is more linguistically diverse, culturally relevant, and challenging, thereby refining the input data distribution ( $P(x)$ ) rather than focusing solely on the output completions ( $P(y|x)$ ).

Figure 1: Prompt transformations consistently improve over translations: Comparison of translated model and our most well-rounded method (Cultural+Difficulty Mix) across different multilingual benchmarks. mArenaHard and Polywrite win-rates are in direct comparison between the two models.

Methodology

Problem Setup

The approach begins by translating a distribution of prompts from a high-resource source language to multiple target languages. The resulting prompts are then subjected to a series of transformative operations designed to enhance specific qualities.

The transformation operator $\mathcal{T}$ is defined as a modular family capable of refining translated prompts into an optimized distribution. These operators, $\mathcal{T}_{\text{nat}}$ , $\mathcal{T}_{\text{cult}}$ , and $\mathcal{T}_{\text{diff}}$ , target the dimensions of naturalness, cultural adaptation, and difficulty enhancement, respectively.

Transformation Operators

Naturalness ( $\mathcal{T}_{\text{nat}}$ ): Focuses on removing translation artifacts and restoring idiomatic phrasing.
Cultural Adaptation ( $\mathcal{T}_{\text{cult}}$ ): Adapts content to reflect culturally relevant examples and norms.
Difficulty Enhancement ( $\mathcal{T}_{\text{diff}}$ ): Expands task complexity by reformulating prompts into more challenging instructions.
Figure 2: Illustration of our prompt transformations on a representative toy example that gets adapted for German: Each transformation modifies the original English prompt, with major modifications highlighted in bold.

Application in Experiments

The transformations are evaluated on twelve languages, leveraging a strong multilingual LLM to generate completions. The data is then used to fine-tune a 7B base model, benchmarking improvements on a diverse set of tasks, including mathematical reasoning, translation, and open-ended generation.

Results and Discussion

Data Quality Improvements

The transformations yielded substantial improvements in data quality across various dimensions. Specifically, naturalness increased lexical diversity, cultural adaptation improved fluency, and difficulty enhancement significantly raised task complexity and overall quality.

Downstream Performance

Figure 3: Translation performance on Flores by language (grouped by those supported in pretraining vs others), compared also against the teacher model.

Across all benchmarks, the models fine-tuned with optimized prompts consistently outperformed those relying solely on translated prompts. Notably, the Cultural+Difficulty Mix model displayed the most robust and balanced performance, with significant gains in tasks demanding higher cultural and contextual understanding.

Performance on multilingual benchmarks emphasized the improvements in less-resourced languages, further validating the approach's effectiveness in extending language coverage and quality.

Figure 4: Performance on lowest-resource languages Welsh (cy), Basque (eu) and Latvian (lv) across three tasks. Win Rates are in comparison with the Translated baseline.

Conclusion

The paradigm shift presented in this paper, from generation to prompt optimization, demonstrates a significant advancement in multilingual LLM capabilities. By refining prompts to foster greater linguistic richness and cultural alignment, the proposed framework greatly enhances the quality and scope of synthetic data generation. Future research directions could include expanding the model's applicability to other underrepresented languages and further refining transformation algorithms to maximize customization for specific language contexts. This paper sets the stage for more inclusive and culturally aware LLMs in the global context.