Papers
Topics
Authors
Recent
2000 character limit reached

Prompt-Based Customization

Updated 12 December 2025
  • Prompt-based customization is a paradigm that adapts large transformers using modular prompts without modifying the backbone parameters.
  • The APT framework composes lightweight prompt modules, enabling efficient, per-source model adaptation and instant unlearning with minimal storage overhead.
  • Empirical results demonstrate competitive accuracy, with performance losses under 5%, making this approach effective for continual and decentralized learning.

Prompt-based customization is a paradigm in which the behavior of large machine learning models—most notably transformers—can be adaptively tailored to specific domains, user data, or tasks by manipulating or generating input prompts, without modifying the model parameters themselves. This approach has emerged as a scalable and modular alternative to parameter fine-tuning, enabling per-user, per-dataset, or per-domain adaptation with minimal parameter storage, robust privacy guarantees, and rapid deployment. Techniques such as soft prompt learning, structured attention masking, and composable prompt modules allow bespoke models to be assembled from independently trained prompt fragments, supporting decentralized data training and instant unlearning. The “À-la-carte Prompt Tuning” (APT) framework formalizes this regime, delivering dynamically assembled “micro-models” with accuracy and efficiency on par with full fine-tuning, while preserving source isolation and compute tractability (Bowman et al., 2023).

1. Principles and Motivation

Prompt-based customization is motivated by practical barriers of monolithic fine-tuning: parameter inefficiency, privacy/ownership concerns for data providers, and inflexibility with respect to adding, removing, or re-weighting information sources. In the APT scheme, a frozen transformer backbone (e.g., a Vision Transformer pre-trained on a large canonical dataset) is equipped with a lightweight prompt mechanism per data source. Each prompt—the “customization unit”—contains learned token embeddings and optionally “memory” tokens, with a parameter count several orders of magnitude below that of the backbone.

This modular structure enables models to be “assembled” for inference on the fly. Privacy is naturally enforced, as prompts only encapsulate information from the data subset on which they were trained. To remove a user or data source, its prompt can be deleted without retraining or affecting the other components. Adding new information only requires training a new prompt, not revisiting or exposing previously seen data.

2. Architecture and Training Procedure

APT is instantiated as follows:

Backbone:

A large transformer (e.g. ViT-B/16 with parameters θ\theta), frozen throughout both prompt training and inference. For an input xx, the model splits it into NN patches or tokens, embeds them as z0RN×dz_0 \in \mathbb{R}^{N \times d}, and applies LL self-attention layers FθF_\theta^\ell.

Prompt Module:

Each data source DiD_i is allocated a soft prompt PiRm×dP_i \in \mathbb{R}^{m \times d} (with mm on the order of 1–10), and per-layer “memory tokens” M(i)Rdmem×dM_\ell^{(i)} \in \mathbb{R}^{d_{\text{mem}} \times d}. The total storage for one prompt bundle is typically less than 0.1%0.1\% of the backbone.

Training:

Given a dataset shard DiD_i, only PiP_i and a shallow classifier headihead_i are trained. The backbone θ\theta is held fixed. Forwarding [z0(x);Pi][z_0(x); P_i] through the transformer under structured attention (see below) yields a prompt embedding pL(i)(x)p_L^{(i)}(x), which is used by headihead_i for classification: LDi(Pi,headi)=(x,y)Di(y,  softmax(headi(pL(i)(x)))),L_{D_i}(P_i, head_i) = \sum_{(x,y) \in D_i} \ell\big(y,\; \mathrm{softmax}(head_i(p_L^{(i)}(x)))\big), where ()\ell(\cdot) is e.g. cross-entropy. AdamW is used for optimization; only the prompt and head are updated (Bowman et al., 2023).

3. Composable Inference and Structured Attention

Composable Inference:

For any subset I{1,,n}I \subseteq \{1,\dots,n\} of the nn data sources, the user may request an ensemble model over those sources:

  • Concatenate their prompt tokens to form PI=[Pi1;Pi2;;PiI]P_I = [P_{i_1}; P_{i_2}; \dots; P_{i_{|I|}}].
  • Prefix these prompts to the input sequence.
  • Pass through the transformer with a structured attention mask that:
    • blocks cross-attention between different prompts,
    • prevents backbone tokens from attending to prompt tokens,
    • allows prompts to attend to their own memory tokens and backbone tokens.

After LL transformer layers, each pL(i)p_L^{(i)} is extracted. The heads' logits y^(i)\hat{y}^{(i)} are averaged,

y^I=1IiIsoftmax(y^(i)),\hat{y}_I = \frac{1}{|I|} \sum_{i \in I} \mathrm{softmax}(\hat{y}^{(i)}),

yielding a prediction influenced strictly by the selected prompts.

Isolation and Modularity:

This attention structure (“structured attention”) ensures zero cross-talk across prompts and strictly compartmentalizes each data source’s influence. No normalization or parameter sharing across prompts is needed. Each prompt bundle is stored as a tiny parameter file, retrievable and removable at will.

4. Empirical Performance and Evaluation

APT was evaluated in multiple regimes:

Sharding and Union Comparison:

On datasets such as MIT-67, CUB-200, Caltech-256, Pets, and Flowers, prompt bundles were trained on up to k=20k=20 shards. Performance drop was bounded by 5%\leq 5\% compared to full-union “paragon” models, with k=10k=10 giving 2%\leq 2\% error relative to joint training (Bowman et al., 2023).

Continual and Class-Incremental Learning:

On Split CIFAR-100 (10 episodes), APT achieved 83.63% accuracy versus L2P’s 83.83% and APT-Weight’s 85.21%. On CORe50 domain-incremental, APT reached 90.89% (APT-Weight 91.14%), surpassing S-iPrompts (89.06%) and memoryless L2P (83.83%).

Resource Efficiency:

Single-pass composable inference incurs only O(N2+I(N+dmem))O(N^2 + |I| (N + d_{\text{mem}})) compute, much less than naively ensembling I|I| full models, and storage per prompt is negligible compared to the backbone.

Regime Paragon Acc. APT Acc. Δk\Delta_k (Drop)
In-domain, k=10k=10 >80%>80\% >78%>78\% \leq2%
CORe50 domain-inc. >>90% >>90% \leq1%
Split CIFAR-100 >>83% >>83% \leq0.2%

5. Advantages, Limitations, and Trade-offs

Strengths:

  • Modular Data Ownership:

Local training on raw data; no central data pooling. Removing a prompt instantly deletes its effect (compliant with unlearning requests).

  • Custom Versioning:

Each user receives a personalized model by selecting authorized prompts. “Firewalling” ensures no parameter sharing or leakage.

  • Compute and Storage Efficiency:

≲0.06% parameter overhead per prompt, with constant-time inference per added source.

Limitations:

  • Expressivity Ceiling:

Structured attention precludes cross-prompt synergies; performance may lag joint fine-tuning on heavily out-of-domain data.

  • Backbone Dependence:

The strength of the frozen backbone is a limiting factor; prompt modules have bounded corrective power.

Open Directions:

  • Learning prompt weighting or selection, rather than equal-weight averaging.
  • Extending the procedure to text or multimodal transformers.
  • Conditioning prompt retrieval on the particulars of each inference instance.
  • Exploring richer cross-prompt interactions (balanced against risk of interference).

6. Theoretical and Practical Implications

A-la-carte prompt tuning realizes a robust form of a-la-carte learning, where per-user or per-source models can be instantly composed. Information can be added or expunged by simple addition or deletion of a prompt; no retraining of the backbone is required (Bowman et al., 2023). The compartmentalized structure has direct implications for privacy, dynamic access control, efficient unlearning, and scalable model deployment.

Empirical results show that APT-built models attain accuracy within 5%5\% of joint-trained models on the full data union, with nearly identical costs for training and inference. For continual and domain-incremental learning, APT achieves state-of-the-art performance, indicating that structured, modular prompt composition is not only practical but highly competitive.

7. Summary Table: Core Properties of À-la-carte Prompt Tuning

Feature Implementation Detail Empirical Outcome
Storage ≲0.06% of backbone params/prompt Negligible RAM/disk usage
Training Local to each source No central data pooling
Inference Composition Concatenate prompt tokens, structured attention mask One forward pass
Removal/Addition Delete/add prompt bundle Instant unlearning/versioning
Accuracy (in-domain) ≤5% drop vs. union fine-tune kk=10: ≤2% drop
Continual learning State-of-the-art on Split CIFAR-100, CORe50 83–91% accuracy

APT demonstrates that prompt-based customization provides an efficient, modular framework for flexible, secure, and high-performing model adaptation—suitable for environments where privacy, composability, and minimal retraining are paramount (Bowman et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Prompt-Based Customization.