Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 84 tok/s

Gemini 2.5 Pro 37 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 86 tok/s Pro

GPT OSS 120B 468 tok/s Pro

Kimi K2 229 tok/s Pro

2000 character limit reached

Measuring diversity of synthetic prompts and data generated with fine-grained persona prompting (2505.17390v1)

Published 23 May 2025 in cs.CL

Abstract: Fine-grained personas have recently been used for generating 'diverse' synthetic data for pre-training and supervised fine-tuning of LLMs. In this work, we measure the diversity of persona-driven synthetically generated prompts and responses with a suite of lexical diversity and redundancy metrics. Firstly, we find that synthetic prompts/instructions are significantly less diverse than human-written ones. Next, we sample responses from LLMs of different sizes with fine-grained and coarse persona descriptions to investigate how much fine-grained detail in persona descriptions contribute to generated text diversity. We find that while persona-prompting does improve lexical diversity (especially with larger models), fine-grained detail in personas doesn't increase diversity noticeably.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Measuring diversity of synthetic prompts and data generated with fine-grained persona prompting (2505.17390v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (3)