RainbowPrompt: Hierarchical Prompt Engineering

Updated 4 December 2025

RainbowPrompt is a framework that hierarchically composes and optimizes prompts, enabling robust and data-driven adaptations for large models.
It employs modular ensembles and iterative refinements to capture diverse task attributes and improve generalization across domains.
Empirical results demonstrate enhanced few-shot performance and continual learning capabilities with significant accuracy gains over traditional methods.

Prompt Of Prompts (POP) refers to a class of approaches and frameworks that hierarchically or modularly compose and optimize prompt representations for large pre-trained models, such as LLMs and vision-LLMs. POP methods enable structured adaptation, combinatorial generalization, and robust continual or automatic prompt optimization by leveraging ensembles, modular composition, or meta-level prompt generators. They contrast with classical prompt-tuning, which typically optimizes a single, flat prompt embedding or text sequence, by introducing a higher-order architecture: prompts that create, organize, or optimize other prompts.

1. POP: Motivation and Core Paradigm

Traditional prompt tuning adapts large pre-trained models with minimal learnable parameters but usually restricts prompt adaptation to a single, globally applied prompt per task or class. However, real-world tasks necessitate modeling heterogeneity (multi-attribute, multi-pattern) or handling new, compositional, or incremental tasks. POP methods generalize prompt adaptation by introducing the concepts of:

Ensembles and partitioned sets of prompts (for capturing attribute diversity and abstraction hierarchy),
Modular, compositional prompt generators (for zero-shot compositional generalization),
Meta-level iterative prompt refinement (for data- and feedback-driven optimization across multiple solution branches).

These strategies systematically extend the representational and adaptation capacity of prompts, supporting objectives such as continual learning without catastrophic forgetting, cross-task information integration, and efficient automatic optimization.

2. Modular and Hierarchical POP Architectures

Multi-Prompt and Partitioned Prompting

Partitioned Multi-modal Prompt (PMPO) (Tian et al., 2023) exemplifies the architectural dimension of the POP paradigm. Given a pre-trained vision-language encoder (e.g., CLIP), PMPO defines a set of $N$ distinct learnable “soft” prompts:

$P' = \{ P^n \in \mathbb{R}^{M \times d} \mid n=1\dots N \}.$

Each $P^n$ is attached to a contiguous group $G^n$ of transformers along the depth of the vision encoder, enabling prompt-specific modulation at different abstraction layers (“depth partitioning”). Textual inputs concatenate prompt tokens with class embedding, then each prompt-tokenized sequence is encoded separately. For the vision encoder, context vectors derived from each $P^n$ are projected and injected at corresponding depth slices via Deep Visual Prompt Tuning (D-VPT):

$Z_k = \text{concat}([x_{k-1}; V^n_k; E_{k-1}])$

for $k \in G^n$ . Parallel passes generate $N$ representation heads, averaged for both text and image branches. The cross-modal classification loss is computed via

$P(y=i | x) = \frac{\exp(\text{sim}(x^*,T^*_i)/\tau )}{\sum_{j=1}^K \exp(\text{sim}(x^*,T^*_j)/\tau)}$

with all non-prompt encoder weights kept frozen.

This structurally partitions the prompt space, providing explicit specialization across hierarchical and orthogonal features, improving generalization across novel classes, datasets, and domains as empirically validated on 11 recognition tasks.

Neural Production System Perspective

The Prompt Production System (PRopS) (Pilault et al., 2023) establishes a different modularity by representing the “prompt of prompts” as an ensemble of $N$ differentiable “rule” modules $f_i$ , each parameterized by attention head or MLP. For an instruction $x$ , PRopS’ controller embeds $x$ to $E$ , scores rule applicability via $M = E R^\top$ , selects top- $k$ rules using a Gumbel-Top- $k$ gating, and composes their outputs to derive the prompt embedding $p$ :

$I = \sum_{i=1}^N s_i O_i,\quad p = g(I)$

where $O_i$ is the output of $f_i$ and $s_i$ is the binary selection vector. This meta-prompt system can synthesize combinatorial rule applications at inference, granting generalized adaptation with low data and few parameters.

3. Automatic and Iterative POP Optimization

Multi-Branched Prompt Optimization

AMPO (Automatic Multi-Branched Prompt Optimization) (Yang et al., 11 Oct 2024) advances POP in the context of iterative, pattern-driven prompt optimization. AMPO maintains a universe of $n$ prompt branches:

$P^{(t)} = \{p_1^{(t)}, p_2^{(t)}, \ldots, p_n^{(t)}\}$

which are refined iteratively through a three-stage loop:

Pattern Recognition: By analyzing failure cases, the system generates error explanations, clusters these into pattern sets $\{(\phi_j, s_j)\}$ , and scores their importance.
Branch Adjustment: Top patterns lead to either branch enrichments (adding new procedural details) or new branch creation (if–else logic).
Branch Pruning: Branches are validated and low-utility ones are removed based on empirical validation score thresholds.

The process is orchestrated by “meta-prompts” that assign the roles of Analyzer, Summarizer, and Revisor to LLM agents, and the system is terminated early if cross-validation does not show improvement, guaranteeing efficient and targeted search.

Data-Driven Instruction-Oriented POP

The FIPO framework (Lu et al., 19 Feb 2024), grounded on the POP (Prompt Optimization Preference) dataset, leverages modular meta-prompts to produce task-specific, instruction-optimized prompts. Optimizers are trained via supervised and preference-based (DPO, IPO, IPL) objectives using high-quality label pairs $(x^{o-}, x^{o+})$ . Modular assembly of prompt, response, and optional context allows fine-tuning to maximize generalized downstream performance across diverse tasks and architectures without privacy risks of API-based adaptation.

4. Continual and Incremental Learning via POP

The “Prompt Of Prompts” model for continual learning (Hu et al., 2023) addresses catastrophic forgetting by decoupling prompt specialization for each task ( $P_t$ ) from a cross-task residual prompt (POP):

$\text{Input: } [S, POP, P_1, ..., P_t]$

For classification, features are concatenated and the loss includes class, task, and auxiliary criteria with only $P_t$ and POP prompts updated at step $t$ . Freezing earlier prompts and using global prompts to share information across tasks enables the retention of previously learned tasks and robust generalization even in few-shot scenarios. Performance exceeds classic replay and prompt-based approaches, especially under domain shift and ultra-low data regimes.

5. Comparative Results and Empirical Insights

Empirical evaluations demonstrate that POP methods consistently outperform single-prompt and traditional fine-tuning strategies in transfer, generalization, and low-data settings. A comparative summary:

Method	Key Strategy	Generalization Gains	Key Empirical Results
PMPO (Tian et al., 2023)	Partitioned multi-prompt	Hierarchical + attribute-wise	+7.62 H over CoOp; best H on 11 datasets
PRopS (Pilault et al., 2023)	Compositional rule-based	Compositional, few-shot	4×+ accuracy on SCAN/CFQ
AMPO (Yang et al., 11 Oct 2024)	Multi-branch, iterative	Failure-driven branches	+2–6% accuracy, 6.4× prompt efficiency
FIPO (Lu et al., 19 Feb 2024)	Data-driven optimizer	Model-agnostic, cross-task	+6.37pp on Llama2-7B (few-shot PiQA)
POP for CL (Hu et al., 2023)	Task plus cross-task prompts	Continual/few-shot CL	82–86% avg. accuracy on CIFAR/ImageNet-R

This demonstrates the scalable effectiveness and robustness of POP paradigms across vision and language domains.

6. Theoretical and Practical Implications

POP advances the field both theoretically and practically:

Specialization without collapse: Partitioning and modularization prevent prompt redundancy and trivial solutions, as shown by ablation studies that reveal performance drops when depth-partitioning or sparse rule selection are ablated (Tian et al., 2023, Pilault et al., 2023).
Combinatorial generalization: The modular assembly (e.g., top- $k$ rule composition, branched logic) enables handling unseen instructions or task decompositions (Pilault et al., 2023).
Efficiency and interpretability: Iterative pruning and modular selection produces high-utility, compact prompts and supports efficiency (AMPO demonstrates a 48× reduction in exploratory search) (Yang et al., 11 Oct 2024).
Data-driven transfer: By separating prompt construction from response evaluation (FIPO), transferability across LLMs is improved and privacy risks in online API tuning are mitigated (Lu et al., 19 Feb 2024).
End-user and practitioner tools: Offline-optimized, robust prompt optimizers can be directly deployed for practical low-shot or cross-domain adaptation (Lu et al., 19 Feb 2024).

Open directions include hierarchical and dynamic meta-prompt orchestration, online feedback-driven optimization, continual learning extensions to other modalities, and investigation of minimal model sizes for stable POP adaptation.

7. Limitations and Open Questions

While POP methods yield substantial gains, several limitations are observed:

The complexity of meta-prompt orchestration and discrete gating (e.g., Gumbel sampling) introduces new tuning challenges (Pilault et al., 2023).
Scaling to very large modular prompt sets or deep branch hierarchies may require additional regularization and pruning mechanisms (Yang et al., 11 Oct 2024).
Current frameworks may incur linear or sublinear memory and inference costs proportional to the number of prompts or task increments (Hu et al., 2023).
Extensions to adaptive, dynamically growing or compressing meta-prompt sets for highly non-stationary, multi-modal, or adversarial settings remain active areas of investigation.

In conclusion, the Prompt Of Prompts paradigm constitutes a robust, extensible meta-architecture for prompt engineering, enabling compositionality, specialization, and efficient, data-driven adaptation across a wide spectrum of large-model applications in vision and language (Tian et al., 2023, Pilault et al., 2023, Hu et al., 2023, Lu et al., 19 Feb 2024, Yang et al., 11 Oct 2024).