Papers
Topics
Authors
Recent
Search
2000 character limit reached

In-Context Generation Techniques

Updated 4 July 2026
  • In-context generation is a method where models leverage provided examples or retrieved evidence to dynamically generate outputs without updating parameters.
  • The approach distinguishes between skill recognition and skill learning by implicitly modeling data-generation functions through in-context examples.
  • It has practical applications in machine translation, data synthesis, image editing, and video imitation, while addressing challenges like semantic drift and bias.

to=arxiv_search.search 大发时时彩怎么 无码不卡高清免费 code {"3query3 generation\"3 OR title:\3"In-Context Generation\"","max_results":3all:\3query3,"sort_by":"submittedDate","sort_order":"descending"} საქმე to=arxiv_search.search 北京赛车前json code {"3query3 AND (abs:generation OR ti:generation)","max_results":3all:\3query3,"sort_by":"submittedDate","sort_order":"descending"} to=arxiv_search.search 东臣ീയ code {"3query3 OR title:\3&&&)\",\"(Sun et al., 2024)\",\"(Zhang et al., 2024)\",\"(Recasens et al., 11 Jun 2025)\",\"(Fang et al., 23 Feb 2025)\",\"(Lee et al., 31 May 2025)\",\"(Yang et al., 2023)\",\"(Kim et al., 2022)\",\"(&&&3all:\3query3&&&)\",\"(&&& OR title:\3&&&)\",\"(&&&3all:\33&&&)\"]","max_results": In-context generation denotes a class of generation procedures in which a model is conditioned on examples, retrieved evidence, or reference signals supplied in its context window and then produces a new output without parameter updates. In the data-generation perspective, a LLM can be viewed as implicitly modeling a family of data-generation functions PRESERVED_PLACEHOLDER_3query3; at inference, it may either select a previously learned function that explains the demonstrations (“skill recognition”) or adapt to the demonstrations to induce a new function (“skill learning”) (&&&3all:\3&&&). Recent work uses this pattern across black-box LLM prompting, synthetic tabular data generation, machine translation, question generation, image editing, image synthesis, and video imitation (&&&3all:\35&&&, Fang et al., 23 Feb 2025, Zhang et al., 2024).

3all:\3. Conceptual and theoretical foundations

A systematic formulation treats pre-training as learning a mixture over latent data-generation functions,

PRESERVED_PLACEHOLDER_3all:\3^

where each PRESERVED_PLACEHOLDER_3 OR title:\3^ defines a conditional distribution or, in classification language, a function fθf_\theta from input to label (&&&3all:\3&&&). Under skill recognition, the model performs implicit Bayesian inference over θ\theta using the demonstrations D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n and concentrates on a latent concept θ\theta^* that best explains them. Under skill learning, the model behaves as an on-the-fly learner, implicitly selecting

f=argminfFi=1n(f(xi),yi),f^*=\arg\min_{f\in\mathcal F}\sum_{i=1}^{n}\ell(f(x_i),y_i),

and then applying f(xtest)f^*(x_{\mathrm{test}}) to the 3query3^ (&&&3all:\3&&&).

This perspective has been sharpened by analyses of linear self-attention. In retrieval-augmented generation, one recent formulation introduces a unified linear predictor y=W1x1+W2x2y=W_1x_1+W_2x_2, where PRESERVED_PLACEHOLDER_3all:\3query3^ is a 3query3 feature and PRESERVED_PLACEHOLDER_3all:\3all:\3^ is a retrieval-derived feature, and shows that one linear self-attention layer can implement one gradient-descent step on the corresponding linearized RAG objective (&&&3all:\33&&&). The result is exact in the constructed linear regime: one forward update changes only the PRESERVED_PLACEHOLDER_3all:\3 OR title:\3^ slot of the 3query3^ token by PRESERVED_PLACEHOLDER_3all:\33^ (&&&3all:\33&&&). The same work also shows the boundary of the analogy: the correspondence remains stable under controlled linear extensions, but becomes feature-distribution dependent under nonlinear architectures, especially under skewed or heavy-tailed feature distributions (&&&3all:\33&&&).

Taken together, these formulations treat in-context generation not as a special-purpose prompt trick but as forward-pass adaptation. This suggests that the central technical question is how context should be constructed, compressed, regularized, or audited so that the induced adaptation is useful rather than misleading.

3 OR title:\3. Context construction in LLMs

A major line of work replaces manually curated demonstrations with generated or optimized context. Self-Generated In-Context Learning (SG-ICL) uses the same autoregressive model as a demonstration generator: given a test instance and a candidate label, the model samples demonstrations with temperature PRESERVED_PLACEHOLDER_3all:\3max_results32^ uses PRESERVED_PLACEHOLDER_3all:\35 generated pairs, and then predicts with an inference template over the generated pool (Kim et al., 2022). On SST-3 OR title:\3, SST-5, RTE, and CB, SG-ICL consistently improves over zero-shot learning, has markedly lower variance than randomly selected gold demonstrations, and is “generally worth approximately 3query3.6 gold training samples”; equivalently, 8 self-generated demonstrations match roughly 5 gold demonstrations (Kim et al., 2022).

Auto-ICL generalizes this idea by having the model autonomously generate either demonstrations, instructions, or both, in a first stage and answer with them in a second stage (Yang et al., 2023). In generating mode, the reported average accuracy is 68.3all:\3, compared with 63.9 for Zero-Shot-CoT and 38.3 for Zero-Shot; in retrieving mode, Auto-ICL reaches 75.7, exceeding Few-Shot, APE, Instruction Induction, and Auto-CoT on the reported averages (Yang et al., 2023). The same study reports that instruction-only context is strongest in retrieving mode, whereas demonstration+instruction is best in generating mode (Yang et al., 2023).

ProGen adds feedback from a task-specific model. It iteratively grows a synthetic dataset, scores examples by a robust influence function using Reverse Cross-Entropy on a synthetic validation set, and feeds the top-PRESERVED_PLACEHOLDER_3all:\36 “helpful” examples back as in-context demonstrations for the next generation round (&&&3 OR title:\38&&&). On five text classification datasets, ProGen improves average zero-shot accuracy from 83 OR title:\3.94 to 86.53all:\3^ for DistilBERT and from 75.56 to 83query3.99 for an LSTM, and matches or exceeds ZeroGen with only 3all:\3% of its synthetic dataset size (&&&3 OR title:\38&&&).

A more explicit optimization of context appears in Li et al.’s two-stage framework for black-box LLMs. The method leaves the original prompt PRESERVED_PLACEHOLDER_3all:\37 intact, learns a policy PRESERVED_PLACEHOLDER_3all:\38 that generates a semantically aligned derived prompt PRESERVED_PLACEHOLDER_3all:\3query32^ queries a fixed response model PRESERVED_PLACEHOLDER_3 OR title:\3query3^ on PRESERVED_PLACEHOLDER_3 OR title:\3all:\3, and then wraps the PRESERVED_PLACEHOLDER_3 OR title:\3 OR title:\3^ pair into a single-shot in-context demonstration for the original prompt (&&&3all:\35&&&). Training maximizes

PRESERVED_PLACEHOLDER_3 OR title:\33^

implemented with a ReMax-style policy gradient while keeping the response model immutable (&&&3all:\35&&&). The inference template explicitly asks the model to “emulate” the way the derived-prompt response answers its question while replying to the original prompt, thereby anchoring the final answer in PRESERVED_PLACEHOLDER_3 OR title:\34 rather than replacing it (&&&3all:\35&&&). On Vicuna Eval with GPT-4, “OURS vs. Original Prompt” wins 93query3.3query3 of the time and “OURS vs. BPO” 88.8%; on Self-Instruct Eval with GPT-4, the corresponding win rates are 76.3 OR title:\3% and 73all:\3.4%; on GPT-3.5, the method maintains 73query3%+ win rate over the original prompt and 65%+ over BPO across benchmarks (&&&3all:\35&&&).

3. Structured-data and task-specific generation

In structured-data generation, in-context generation is used both as a substitute for fine-tuning and as a target for prompt optimization. TabGen-ICL formulates tabular synthesis with a fixed LLM and an iterative residual-aware selector: at iteration PRESERVED_PLACEHOLDER_3 OR title:\3sort_by32^ it chooses a subset

PRESERVED_PLACEHOLDER_3 OR title:\36

where PRESERVED_PLACEHOLDER_3 OR title:\37 is the set of generated rows so far and PRESERVED_PLACEHOLDER_3 OR title:\38 alternates between Jensen–Shannon divergence and Kolmogorov–Smirnov distance (Fang et al., 23 Feb 2025). The selected examples are JSON-serialized into the prompt, and the loop progressively narrows the gap between generated and real distributions. Across five real-world tables, TabGen-ICL reduces the error rate by 3.5%–43 OR title:\3.3 OR title:\3% on fidelity metrics relative to random selection (Fang et al., 23 Feb 2025).

The same dependence on context creates a fairness risk. In LLM-based tabular generation, few-shot prompts consist of PRESERVED_PLACEHOLDER_3 OR title:\39 demonstrations fθf_\theta3query3, and a bias parameter

fθf_\theta3all:\3^

controls the label imbalance for a protected subgroup in those demonstrations (Recasens et al., 11 Jun 2025). The reported empirical result is that even mild in-context bias leads to global statistical distortion: as fθf_\theta3 OR title:\3^ increases, the generated fθf_\theta3 tracks it nearly linearly, the effect strengthens with larger context size, and all tested models leak in-context imbalances (Recasens et al., 11 Jun 2025). In the adversarial setting, an attacker controlling a fraction fθf_\theta4 of the in-context records can drive downstream fairness violations; at fθf_\theta5, a Random Forest trained on the synthetic data exhibits fθf_\theta6, while utility degrades only modestly and fidelity remains high, with TVC fθf_\theta7 and JSD fθf_\theta8 in the reported example (Recasens et al., 11 Jun 2025).

For machine translation in low-resource settings, Demonstration Augmentation for Translation (DAT) generates a candidate pool fθf_\theta9 of source-side examples, filters it with an MMR objective balancing relevance and diversity, generates target sides zero-shot, and then translates the 3query3^ with the resulting few-shot prompt (Lee et al., 31 May 2025). In the reported setup, θ\theta3query3, θ\theta3all:\3, and θ\theta3 OR title:\3^ (Lee et al., 31 May 2025). On English to Nepali, Khmer, Pashto, Zulu, and Swahili with Llama-3.3all:\3-73query3 DAT outperforms zero-shot in 4 out of 5 languages and avoids the severe backfire observed for fixed human pairs, including a 3 OR title:\3all:\3.6 COMET-point drop on Khmer for the few-shot fixed-pair baseline relative to zero-shot (Lee et al., 31 May 2025).

Task-specific generation also includes educational and commonsense settings. In automatic question generation from educational passages, GPT-4 with ICL and a Hybrid model combining ICL and retrieval both outperform baseline models; among automated metrics, ICL(θ\theta3) obtains ROUGE-L 55.95, METEOR 34.63 OR title:\3, ChRF 63query3.48, and BERTScore 75.93 OR title:\3, while the Hybrid model is best on all reported human measures except Answerability (&&&43 OR title:\3&&&). In commonsense generation, a two-step diversification wrapper first produces default outputs, then asks the model to generate sentences different from its previous outputs when diversity is low; on CommonGen with GPT3.5-turbo, this raises the harmonic mean of diversity and BERTScore from 39.6 for default ICL to 73 OR title:\3.7 for the proposed ICD selection, while substantially lowering self-BLEU-4 from 73 OR title:\3.4 to 3 OR title:\3all:\3.3query3^ (Zhang et al., 2024).

4. Visual and multimodal in-context generation

In image generation, one influential design principle is to separate contextual appearance from structural control. Context Diffusion augments a latent diffusion UNet with a visual-context encoder, a frozen CLIP text encoder, a 3query3 encoder derived from ControlNet, and modified cross-attention that attends jointly to text embeddings θ\theta4 and visual-context embeddings θ\theta5 (&&&3 OR title:\3&&&). Prompt dropout replaces the text prompt with the empty string with probability θ\theta6, forcing the model to rely on visual context (&&&3 OR title:\3&&&). The reported user study shows especially large gains when only context is present: in-domain, the method wins 83query3.3 OR title:\3% versus 4.5% for Prompt Diffusion under context-only conditioning; out-of-domain, the corresponding figures are 63.7% versus 3 OR title:\3 OR title:\3.8% (&&&3 OR title:\3&&&).

X-Prompt extends in-context generation to a purely auto-regressive vision-LLM by compressing each in-context example into a small set of learnable “X-Prompt” tokens through cross-attention (Sun et al., 2024). If θ\theta7 is an example sequence and θ\theta8 are the learnable compression tokens, the model forces information flow through θ\theta9 and blocks direct attention from raw example tokens to target tokens (Sun et al., 2024). This makes the total context length

D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n3query3^

with a practical window of up to 5,3all:\3 OR title:\3query3^ tokens (Sun et al., 2024). In zero-shot unseen tasks, the reported gains are large: low-light enhancement PSNR improves from 9.3all:\34 to 3all:\37.3query3query3 derain PSNR from 7.93 OR title:\3^ to 3all:\38.3all:\3query3 object addition D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n3all:\3^ from D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n3 OR title:\3^ to 3query3.3query3query3 OR title:\3, and unseen depth-color palette RMSE from 3query3.745 to 3query3.393query3^ (Sun et al., 2024).

Video In-context Learning applies the same principle to tokenized video. A decoder-only LLaMA-style Transformer is trained self-supervised on 3all:\36-frame clips tokenized by a pretrained VQ-GAN into 4,3query396 tokens plus bos/eos, with no explicit demo/3query3^ structure during training (Zhang et al., 2024). At inference, demonstration clips and 3query3^ frames are concatenated into one causal prefix, and the model autoregressively samples future frames (Zhang et al., 2024). With the 3all:\3.3all:\3 model, in-class demonstrations raise probing accuracy from 3 OR title:\39.6% to 36.7% and V-Acc by 3all:\3.8 points, while PSNR and FID improve with model scale (Zhang et al., 2024). The paper characterizes the resulting behavior as zero-shot imitation from demonstration videos (Zhang et al., 2024).

Latent-space flow and diffusion transformers now provide a unified setting for in-context image generation and editing. FLUX.3all:\3^ Kontext uses simple sequence concatenation of text and image latents, offsets the “time” coordinate of each context image in factorized 3D RoPE, and trains a rectified-flow transformer with a conditional flow-matching objective in latent space (&&&3all:\3 OR title:\3&&&). On KontextBench, a benchmark with 3all:\3,3query3 OR title:\36 image-prompt pairs across local editing, global editing, character reference, style reference, and text editing, FLUX.3all:\3^ Kontext[pro] and [max] rank at or near the top in human ELO evaluations; on five successive edits, AuraFace cosine similarity averages 3query3.93query3 for FLUX.3all:\3^ Kontext[pro], versus 3query3.774 for Runway Gen-4 and 3query3.43all:\3 for GPT-4o-High (&&&3all:\3 OR title:\3&&&). The reported inference time is 3–5 seconds for 3all:\3query3 OR title:\33all:\3query3 OR title:\34 images on a single A3all:\3query3query3^ GPU (&&&3all:\3 OR title:\3&&&).

5. Efficiency, token management, and forward-only adaptation

As contextual generation scales, sequence length becomes the main systems bottleneck. In Diffusion Transformers, in-context generation concatenates noisy latent tokens D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n3 with a fixed reference sequence D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n4, giving self-attention cost D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n5 (&&&3all:\3all:\3&&&). ToPi addresses this with training-free token pruning. It first computes a layerwise Context Sensitivity Score

D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n6

on a calibration set, selects the top-D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n7 representative layers, and then scores each context token by a value-weighted attention influence metric (&&&3all:\3all:\3&&&). Pruning is updated only at anchor timesteps through a fidelity-constrained objective that preserves at least a fraction D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n8 of total influence (&&&3all:\3all:\3&&&). On Flux.3all:\3-Kontext and Qwen-Image-Edit, ToPi yields about 3all:\3.3 OR title:\3all:\3×–3all:\3.33× speedup, recovers within D={(xi,yi)}i=1nD=\{(x_i,y_i)\}_{i=1}^n9 dB of full-context PSNR, adds θ\theta^*3query3^ latency overhead, and removes over 53query3% of reference tokens on average while preserving at least 85% of the context “information mass” (&&&3all:\3all:\3&&&).

Instructional image editing exposes a parallel efficiency–precision problem. In-Context Edit treats a pretrained DiT inpainting model as a black box by forming a side-by-side in-context image in which the source image occupies the left half and the target half is masked; the associated IC prompt describes the original image on the left and the instructed edit on the right (&&&3all:\3query3&&&). A training-free version already benefits from the IC prompt alone, improving CLIP-I from 3query3.683all:\3^ to 3query3.794 and GPT from 3query3.3all:\3 to 3query3.3 OR title:\34 in the reported ablation (&&&3all:\3query3&&&). The paper then adds a LoRA-MoE hybrid, in which the output of the frozen base layer is augmented by a sparse mixture of low-rank experts, and an early-filter inference-time scaling method that scores partial denoising trajectories with Qwen-VL-73 OR title:\3B (&&&3all:\3query3&&&). The early filter improves SC by 3all:\39% and overall VIE-Score by 3all:\36% over single-seed outputs (&&&3all:\3query3&&&).

Forward-only adaptation also appears in retrieval-augmented generation. RAG-GD keeps the retriever and LLM backbone frozen, trains a base retrieval adapter θ\theta^*3all:\3, and then meta-trains a predictor θ\theta^*3 OR title:\3^ that maps a few-shot RAG support set to low-rank updates approximating what θ\theta^*3 steps of SGD would have done to the retrieval interface (&&&3all:\33&&&). At inference, the update is produced in one small forward pass rather than test-time backpropagation (&&&3all:\33&&&). On Qwen 3 OR title:\3.5 B with E5 retrieval, the reported average EM/F3all:\3^ improves from 34.3all:\36/43 OR title:\3.54 to 36.73all:\3/45.3all:\3all:\3 and the method approaches test-time gradient adaptation at much lower per-3query3^ cost (&&&3all:\33&&&).

6. Failure modes, safeguards, and open problems

A recurring concern is that contextual generation can improve surface quality while drifting semantically or statistically. Li et al.’s derived-prompt framework addresses semantic drift with two explicit safeguards: a KL penalty keeps the learned derived-prompt policy close to its reference initialization, and the final inference template always asks the model to answer the original prompt rather than the derived one (&&&3all:\35&&&). In tabular generation, by contrast, the few-shot examples themselves may be the attack surface; prompt audit, balanced prompt design, fairness-guided exemplar selection, post-generation debiasing, and model-internal defenses are proposed as mitigation strategies for in-context bias propagation (Recasens et al., 11 Jun 2025).

Several modality-specific limitations remain. Context Diffusion reports that very fine-grained local edits can still fail and that, when visual context and text disagree, the model tends to favor the context image (&&&3 OR title:\3&&&). X-Prompt notes that the base Chameleon VQ-VAE compresses at 3all:\3 and loses fine detail, and that generalization degrades across completely unrelated tasks (Sun et al., 2024). FLUX.3all:\3^ Kontext reports minor artifact accumulation and occasional instruction non-compliance after 6–7 edits in multi-turn workflows (&&&3all:\3 OR title:\3&&&). In machine translation, progressive accumulation of synthetic demonstrations improves retrieval-based reuse but does not fully match dynamic on-the-fly generation (Lee et al., 31 May 2025). In educational question generation, example selection remains sensitive, and retrieved passages may be semantically related yet contextually irrelevant (&&&43 OR title:\3&&&).

The broader research agenda remains explicitly open. The data-generation survey identifies the mechanistic origin of skill learning, the causal linkage between the pre-training function class and learnable in-context functions, the extension of the framework to chain-of-thought reasoning and self-critique, and unified probabilistic frameworks that cover both recognition and learning as central future directions (&&&3all:\3&&&). A plausible implication is that progress in in-context generation will depend less on any single prompting heuristic than on a joint theory of context selection, context compression, implicit optimization, and failure analysis across modalities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to In-Context Generation.