CreativeDC: Two-Phase Prompting Framework

Updated 1 January 2026

CreativeDC is a two-phase prompting framework that employs divergent and convergent thinking to mitigate repetitive outputs in large language models.
It separates ideation from constraint enforcement, ensuring a broad exploration of ideas followed by precise programming problem formulation.
Its performance is evaluated using rigorous lexical, semantic, and utility metrics, demonstrating significant improvements in output diversity and novelty.

CreativeDC is a two-phase prompting framework for LLMs designed to generate problems and questions with enhanced creativity, diversity, and novelty. The method draws from Wallas’s stages of creative process and Guilford’s framework of divergent–convergent thinking, specifically targeting the “Artificial Hivemind” effect observed in LLM outputs, where repeated use of a single model or even different models generates excessively similar content. CreativeDC operationalizes explicit separation between idea generation (divergence) and constraint satisfaction (convergence) within the LLM’s inference workflow, leading to markedly greater output heterogeneity without compromising utility (Nguyen et al., 29 Dec 2025).

1. Conceptual Foundations and Motivation

CreativeDC is motivated by the limitations of contemporary LLMs in creative problem generation. Despite scaling, LLMs exhibit the “Artificial Hivemind” effect: intra-model repetition and inter-model homogeneity. In educational contexts, this reduces exposure to varied problem statements and modes of reasoning. The core insight is that standard prompting imposes all constraints at once, encouraging models to default to statistically “safe” outputs. By contrast, CreativeDC structurally decouples ideation from constraint satisfaction, emulating established theories of creative cognition—most notably, Wallas’s “preparation–incubation–illumination–verification” and Guilford’s divergent–convergent thinking—thus scaffolding the model’s reasoning to first maximize exploration, then enforce correctness and topical focus.

2. Two-Phase Prompting Architecture

CreativeDC implements a two-phase architecture at inference time, orchestrated through carefully structured prompts. The process is conceptually serial but realized in a single LLM query. The phases are:

Divergent Thinking Phase:

The LLM is instructed to consider only the problem theme and rapidly enumerate maximally different, underexplored elements, objects, or scenarios linked to that theme. Crucially, task-specific programming concepts or other constraints are excluded at this stage, minimizing cognitive load and fostering expansive, unconventional ideation.

Convergent Thinking Phase:

From the brainstormed ideas, the LLM selects one and attempts to synthesize a programming problem that utilizes precisely the target concept (e.g., a specific Python construct). If the first selection fails to meet the constraints, the model is directed to iteratively backtrack and select another idea, mimicking iterative refinement and rejection strategies found in human problem-solving.

Prompt Template:

A fixed prompt embodies this process:

Apply the following thinking process to generate a problem.
Divergent thinking phase: Think about only the given theme and list wildly different, underexplored elements, objects, scenarios, or situations relevant to the theme. Ignore the required programming concepts in this phase. Push for unusual, surprising, unconventional, and diverse ideas.
Convergent thinking phase: From your brainstormed ideas, select one and connect it with the required programming concepts to create a creative programming problem. Make sure the problem does not require any other programming concepts. If it does not work, go back and select another idea and try again.

## Task Instruction
Given a theme of {theme}, create a Python programming problem that requires only {concept} to solve. The problem must include a description, a pytest‐style test suite, and a reference solution.

The process is further extensible with persona simulation; an additional instruction header sampled from a “Persona Hub” can be prepended to modulate LLM “voice” consistency across outputs.

3. Procedural and Implementation Details

CreativeDC operationalizes its workflow using the following components:

Model: Qwen3-235B-A22B-Instruct with a sampling temperature of 1.0.
Embeddings: Qwen3-Embedding-0.6B for all lexical and semantic metric computations.
Automatic Judging: Gemini 2.5 Flash-Lite (temperature 0.0) for automated problem validation.
Output Format: A single LLM query returns a structured JSON object with separate keys for “divergent_thinking,” “convergent_thinking,” “description,” “test_suite,” and “solution.”

This structure standardizes output granularity and enables fine-grained post hoc analysis of both creativity process and constraint satisfaction stages.

4. Formal Evaluation Metrics

CreativeDC’s output is evaluated on three axes with strict mathematical formulations:

Axis	Lexical Metric	Semantic Metric
Diversity	$\mathrm{LexDiv}_{n}(\mathcal{S})$	$\mathrm{SemDiv}(\mathcal{S})$
Novelty	$\mathrm{LexNov}_{n}(\mathcal{P},\mathcal{R})$	$\mathrm{SemNov}(\mathcal{P},\mathcal{R})$
Utility	$\mathrm{Utility}(\mathcal{P})$	N/A

Lexical Diversity $\mathrm{LexDiv}_n(\mathcal{S})$ : Fraction of unique $n$ -grams among all outputs.
Semantic Diversity $\mathrm{SemDiv}(\mathcal{S})$ : Mean cosine distance (computed in embedding space) between all problem pairs.
Lexical Novelty $\mathrm{LexNov}_n(\mathcal{P},\mathcal{R})$ : Proportion of novel $n$ -grams in a candidate problem with respect to a reference corpus $\mathcal{R}$ formed from all outputs of other methods under the same context.
Semantic Novelty $\mathrm{SemNov}(\mathcal{P},\mathcal{R})$ : Minimum cosine distance to any item in the reference corpus.
Utility $\mathrm{Utility}(\mathcal{P})$ : Product of three binary judgments: Validity, Relevance, and Comprehensibility.
Vendi Score $\mathrm{Vendi}(\mathcal{S})$ : An information-theoretic measure for effective number of distinct problems. For similarity matrix $\mathbf{K}$ , $\mathrm{Vendi}(\mathcal{S}) = \exp\left(-\sum_{i=1}^K \lambda_i\log\lambda_i\right)$ where $\{\lambda_i\}$ are eigenvalues of $\mathbf{K}/K$ .

These metrics jointly quantify structural diversity, semantic distinctness, novelty relative to non-DC outputs, and minimal pedagogical standards.

5. Experimental Design and Comparative Results

CreativeDC is comparatively benchmarked on creative Python problem generation across 20 contexts (product of 4 themes and 5 basic concepts), each yielding $K=100$ valid problems per method using regeneration until all test suites pass.

Baselines:

Base: Contextualized task description only.
CoT: “Think step by step” appended.
CreativeDC: Full two-phase divergent–convergent scaffold. All methods are evaluated both with and without persona simulation.

Quantitative Results (mean ± SE, over 20 contexts, K=100):

Method	LexDiv	SemDiv	LexNov	SemNov	Utility
Base	0.74±0.01	0.46±0.01	0.62±0.01	0.20±0.01	92.95±0.83%
CoT	0.75±0.01	0.46±0.01	0.66±0.02	0.18±0.01	91.35±1.24%
CreativeDC	0.81±0.00	0.54±0.01	0.73±0.01	0.30±0.01	90.85±0.88%
Base+Persona	0.81±0.01	0.49±0.01	0.66±0.01	0.22±0.01	91.80±1.14%
CoT+Persona	0.82±0.01	0.52±0.01	0.67±0.01	0.23±0.01	89.70±1.02%
CreativeDC+Per	0.84±0.00	0.56±0.01	0.75±0.01	0.31±0.01	89.65±1.12%

Vendi Score Growth: At $K=10$ , CreativeDC outpaces CoT by 24%; at $K=100$ , by 72%. This indicates dramatically superior scaling in the generation of truly distinct problems as sample size increases.

Empirical gains in diversity and novelty (all $p<0.01$ ) are robust across persona settings. Utility remains competitive; any reduction is minimal (e.g., CreativeDC drops less than 2% utility relative to Base).

6. Analysis and Broader Implications

By structurally decoupling the stages of ideation and constraint satisfaction, CreativeDC addresses the root cause of output homogenization in LLMs. The divergent phase ensures exploration is unconstrained by technical requirements, while the convergent phase strictly enforces solution validity and topical relevance. This staged cognition prevents models from prematurely “locking in” to the most probable solutions found in training data—thus breaking the self-reinforcing cycle of safe, template-driven outputs.

For educational content generation, CreativeDC’s ability to produce large banks of programming problems with demonstrably increased thematic and narrative variety bears direct implications: it mitigates student exposure to repetitive formulations and supports the design of assessments that encourage diverse thinking styles. The method retains high rates of validity, relevance, and comprehensibility even as diversity metrics scale with sample size.

A plausible implication is that the CreativeDC paradigm may extend to other generative domains where “template collapse” is problematic, provided that the domain permits explicit prompt scaffolding for ideation and evaluation.

CreativeDC exemplifies an approach to creativity in LLMs based on staged reasoning and prompt engineering, distinct from techniques in the vision domain such as Creative Adversarial Networks and their conditional variants (Hereu et al., 2024). Whereas CAN/CCAN induce creativity at the loss-objective level—explicitly encouraging generators to maximize style ambiguity or resistance to classification—CreativeDC externalizes the creative process into prompt structure and selection-based iteration. Both approaches share the strategy of operationalizing creative cognition theories (e.g., Martindale’s arousal-potential, Wallas’s creative stages), but differ fundamentally in how and where generative novelty is induced: via architectural loss terms or explicit reasoning scaffolds. This distinction highlights the importance of domain- and modality-appropriate mechanisms for machine creativity.