Prompt-Based Customization

Updated 12 December 2025

Prompt-based customization is a paradigm that adapts large transformers using modular prompts without modifying the backbone parameters.
The APT framework composes lightweight prompt modules, enabling efficient, per-source model adaptation and instant unlearning with minimal storage overhead.
Empirical results demonstrate competitive accuracy, with performance losses under 5%, making this approach effective for continual and decentralized learning.

Prompt-based customization is a paradigm in which the behavior of large machine learning models—most notably transformers—can be adaptively tailored to specific domains, user data, or tasks by manipulating or generating input prompts, without modifying the model parameters themselves. This approach has emerged as a scalable and modular alternative to parameter fine-tuning, enabling per-user, per-dataset, or per-domain adaptation with minimal parameter storage, robust privacy guarantees, and rapid deployment. Techniques such as soft prompt learning, structured attention masking, and composable prompt modules allow bespoke models to be assembled from independently trained prompt fragments, supporting decentralized data training and instant unlearning. The “À-la-carte Prompt Tuning” (APT) framework formalizes this regime, delivering dynamically assembled “micro-models” with accuracy and efficiency on par with full fine-tuning, while preserving source isolation and compute tractability (Bowman et al., 2023).

1. Principles and Motivation

Prompt-based customization is motivated by practical barriers of monolithic fine-tuning: parameter inefficiency, privacy/ownership concerns for data providers, and inflexibility with respect to adding, removing, or re-weighting information sources. In the APT scheme, a frozen transformer backbone (e.g., a Vision Transformer pre-trained on a large canonical dataset) is equipped with a lightweight prompt mechanism per data source. Each prompt—the “customization unit”—contains learned token embeddings and optionally “memory” tokens, with a parameter count several orders of magnitude below that of the backbone.

This modular structure enables models to be “assembled” for inference on the fly. Privacy is naturally enforced, as prompts only encapsulate information from the data subset on which they were trained. To remove a user or data source, its prompt can be deleted without retraining or affecting the other components. Adding new information only requires training a new prompt, not revisiting or exposing previously seen data.

2. Architecture and Training Procedure

APT is instantiated as follows:

Backbone:

A large transformer (e.g. ViT-B/16 with parameters $\theta$ ), frozen throughout both prompt training and inference. For an input $x$ , the model splits it into $N$ patches or tokens, embeds them as $z_0 \in \mathbb{R}^{N \times d}$ , and applies $L$ self-attention layers $F_\theta^\ell$ .

Prompt Module:

Each data source $D_i$ is allocated a soft prompt $P_i \in \mathbb{R}^{m \times d}$ (with $m$ on the order of 1–10), and per-layer “memory tokens” $M_\ell^{(i)} \in \mathbb{R}^{d_{\text{mem}} \times d}$ . The total storage for one prompt bundle is typically less than $0.1\%$ of the backbone.

Training:

Given a dataset shard $D_i$ , only $P_i$ and a shallow classifier $head_i$ are trained. The backbone $\theta$ is held fixed. Forwarding $[z_0(x); P_i]$ through the transformer under structured attention (see below) yields a prompt embedding $p_L^{(i)}(x)$ , which is used by $head_i$ for classification: $L_{D_i}(P_i, head_i) = \sum_{(x,y) \in D_i} \ell\big(y,\; \mathrm{softmax}(head_i(p_L^{(i)}(x)))\big),$ where $\ell(\cdot)$ is e.g. cross-entropy. AdamW is used for optimization; only the prompt and head are updated (Bowman et al., 2023).

3. Composable Inference and Structured Attention

Composable Inference:

For any subset $I \subseteq \{1,\dots,n\}$ of the $n$ data sources, the user may request an ensemble model over those sources:

Concatenate their prompt tokens to form $P_I = [P_{i_1}; P_{i_2}; \dots; P_{i_{|I|}}]$ .
Prefix these prompts to the input sequence.
Pass through the transformer with a structured attention mask that:
- blocks cross-attention between different prompts,
- prevents backbone tokens from attending to prompt tokens,
- allows prompts to attend to their own memory tokens and backbone tokens.

After $L$ transformer layers, each $p_L^{(i)}$ is extracted. The heads' logits $\hat{y}^{(i)}$ are averaged,

$\hat{y}_I = \frac{1}{|I|} \sum_{i \in I} \mathrm{softmax}(\hat{y}^{(i)}),$

yielding a prediction influenced strictly by the selected prompts.

Isolation and Modularity:

This attention structure (“structured attention”) ensures zero cross-talk across prompts and strictly compartmentalizes each data source’s influence. No normalization or parameter sharing across prompts is needed. Each prompt bundle is stored as a tiny parameter file, retrievable and removable at will.

4. Empirical Performance and Evaluation

APT was evaluated in multiple regimes:

Sharding and Union Comparison:

On datasets such as MIT-67, CUB-200, Caltech-256, Pets, and Flowers, prompt bundles were trained on up to $k=20$ shards. Performance drop was bounded by $\leq 5\%$ compared to full-union “paragon” models, with $k=10$ giving $\leq 2\%$ error relative to joint training (Bowman et al., 2023).

Continual and Class-Incremental Learning:

On Split CIFAR-100 (10 episodes), APT achieved 83.63% accuracy versus L2P’s 83.83% and APT-Weight’s 85.21%. On CORe50 domain-incremental, APT reached 90.89% (APT-Weight 91.14%), surpassing S-iPrompts (89.06%) and memoryless L2P (83.83%).

Resource Efficiency:

Single-pass composable inference incurs only $O(N^2 + |I| (N + d_{\text{mem}}))$ compute, much less than naively ensembling $|I|$ full models, and storage per prompt is negligible compared to the backbone.

Regime	Paragon Acc.	APT Acc.	$\Delta_k$ (Drop)
In-domain, $k=10$	$>80\%$	$>78\%$	$\leq$ 2%
CORe50 domain-inc.	$>$ 90%	$>$ 90%	$\leq$ 1%
Split CIFAR-100	$>$ 83%	$>$ 83%	$\leq$ 0.2%

5. Advantages, Limitations, and Trade-offs

Strengths:

Modular Data Ownership:

Local training on raw data; no central data pooling. Removing a prompt instantly deletes its effect (compliant with unlearning requests).

Custom Versioning:

Each user receives a personalized model by selecting authorized prompts. “Firewalling” ensures no parameter sharing or leakage.

Compute and Storage Efficiency:

≲0.06% parameter overhead per prompt, with constant-time inference per added source.

Limitations:

Expressivity Ceiling:

Structured attention precludes cross-prompt synergies; performance may lag joint fine-tuning on heavily out-of-domain data.

Backbone Dependence:

The strength of the frozen backbone is a limiting factor; prompt modules have bounded corrective power.

Open Directions:

Learning prompt weighting or selection, rather than equal-weight averaging.
Extending the procedure to text or multimodal transformers.
Conditioning prompt retrieval on the particulars of each inference instance.
Exploring richer cross-prompt interactions (balanced against risk of interference).

6. Theoretical and Practical Implications

A-la-carte prompt tuning realizes a robust form of a-la-carte learning, where per-user or per-source models can be instantly composed. Information can be added or expunged by simple addition or deletion of a prompt; no retraining of the backbone is required (Bowman et al., 2023). The compartmentalized structure has direct implications for privacy, dynamic access control, efficient unlearning, and scalable model deployment.

Empirical results show that APT-built models attain accuracy within $5\%$ of joint-trained models on the full data union, with nearly identical costs for training and inference. For continual and domain-incremental learning, APT achieves state-of-the-art performance, indicating that structured, modular prompt composition is not only practical but highly competitive.

7. Summary Table: Core Properties of À-la-carte Prompt Tuning

Feature	Implementation Detail	Empirical Outcome
Storage	≲0.06% of backbone params/prompt	Negligible RAM/disk usage
Training	Local to each source	No central data pooling
Inference Composition	Concatenate prompt tokens, structured attention mask	One forward pass
Removal/Addition	Delete/add prompt bundle	Instant unlearning/versioning
Accuracy (in-domain)	≤5% drop vs. union fine-tune	$k$ =10: ≤2% drop
Continual learning	State-of-the-art on Split CIFAR-100, CORe50	83–91% accuracy

APT demonstrates that prompt-based customization provides an efficient, modular framework for flexible, secure, and high-performing model adaptation—suitable for environments where privacy, composability, and minimal retraining are paramount (Bowman et al., 2023).

Markdown Upgrade to Chat

References (1)

À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Prompt-Based Customization.

Prompt-Based Customization

1. Principles and Motivation

2. Architecture and Training Procedure

3. Composable Inference and Structured Attention

4. Empirical Performance and Evaluation

5. Advantages, Limitations, and Trade-offs

6. Theoretical and Practical Implications

7. Summary Table: Core Properties of À-la-carte Prompt Tuning

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Prompt-Based Customization

1. Principles and Motivation

2. Architecture and Training Procedure

3. Composable Inference and Structured Attention

4. Empirical Performance and Evaluation

5. Advantages, Limitations, and Trade-offs

6. Theoretical and Practical Implications

7. Summary Table: Core Properties of À-la-carte Prompt Tuning

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research