In-Context Generation Techniques
- In-context generation is a method where models leverage provided examples or retrieved evidence to dynamically generate outputs without updating parameters.
- The approach distinguishes between skill recognition and skill learning by implicitly modeling data-generation functions through in-context examples.
- It has practical applications in machine translation, data synthesis, image editing, and video imitation, while addressing challenges like semantic drift and bias.
to=arxiv_search.search 大发时时彩怎么 无码不卡高清免费 code {"3query3 generation\"3 OR title:\3"In-Context Generation\"","max_results":3all:\3query3,"sort_by":"submittedDate","sort_order":"descending"} საქმე to=arxiv_search.search 北京赛车前json code {"3query3 AND (abs:generation OR ti:generation)","max_results":3all:\3query3,"sort_by":"submittedDate","sort_order":"descending"} to=arxiv_search.search 东臣ീയ code {"3query3 OR title:\3&&&)\",\"(Sun et al., 2024)\",\"(Zhang et al., 2024)\",\"(Recasens et al., 11 Jun 2025)\",\"(Fang et al., 23 Feb 2025)\",\"(Lee et al., 31 May 2025)\",\"(Yang et al., 2023)\",\"(Kim et al., 2022)\",\"(&&&3all:\3query3&&&)\",\"(&&& OR title:\3&&&)\",\"(&&&3all:\33&&&)\"]","max_results": In-context generation denotes a class of generation procedures in which a model is conditioned on examples, retrieved evidence, or reference signals supplied in its context window and then produces a new output without parameter updates. In the data-generation perspective, a LLM can be viewed as implicitly modeling a family of data-generation functions PRESERVED_PLACEHOLDER_3query3; at inference, it may either select a previously learned function that explains the demonstrations (“skill recognition”) or adapt to the demonstrations to induce a new function (“skill learning”) (&&&3all:\3&&&). Recent work uses this pattern across black-box LLM prompting, synthetic tabular data generation, machine translation, question generation, image editing, image synthesis, and video imitation (&&&3all:\35&&&, Fang et al., 23 Feb 2025, Zhang et al., 2024).
3all:\3. Conceptual and theoretical foundations
A systematic formulation treats pre-training as learning a mixture over latent data-generation functions,
PRESERVED_PLACEHOLDER_3all:\3^
where each PRESERVED_PLACEHOLDER_3 OR title:\3^ defines a conditional distribution or, in classification language, a function from input to label (&&&3all:\3&&&). Under skill recognition, the model performs implicit Bayesian inference over using the demonstrations and concentrates on a latent concept that best explains them. Under skill learning, the model behaves as an on-the-fly learner, implicitly selecting
and then applying to the 3query3^ (&&&3all:\3&&&).
This perspective has been sharpened by analyses of linear self-attention. In retrieval-augmented generation, one recent formulation introduces a unified linear predictor , where PRESERVED_PLACEHOLDER_3all:\3query3^ is a 3query3 feature and PRESERVED_PLACEHOLDER_3all:\3all:\3^ is a retrieval-derived feature, and shows that one linear self-attention layer can implement one gradient-descent step on the corresponding linearized RAG objective (&&&3all:\33&&&). The result is exact in the constructed linear regime: one forward update changes only the PRESERVED_PLACEHOLDER_3all:\3 OR title:\3^ slot of the 3query3^ token by PRESERVED_PLACEHOLDER_3all:\33^ (&&&3all:\33&&&). The same work also shows the boundary of the analogy: the correspondence remains stable under controlled linear extensions, but becomes feature-distribution dependent under nonlinear architectures, especially under skewed or heavy-tailed feature distributions (&&&3all:\33&&&).
Taken together, these formulations treat in-context generation not as a special-purpose prompt trick but as forward-pass adaptation. This suggests that the central technical question is how context should be constructed, compressed, regularized, or audited so that the induced adaptation is useful rather than misleading.
3 OR title:\3. Context construction in LLMs
A major line of work replaces manually curated demonstrations with generated or optimized context. Self-Generated In-Context Learning (SG-ICL) uses the same autoregressive model as a demonstration generator: given a test instance and a candidate label, the model samples demonstrations with temperature PRESERVED_PLACEHOLDER_3all:\3max_results32^ uses PRESERVED_PLACEHOLDER_3all:\35 generated pairs, and then predicts with an inference template over the generated pool (Kim et al., 2022). On SST-3 OR title:\3, SST-5, RTE, and CB, SG-ICL consistently improves over zero-shot learning, has markedly lower variance than randomly selected gold demonstrations, and is “generally worth approximately 3query3.6 gold training samples”; equivalently, 8 self-generated demonstrations match roughly 5 gold demonstrations (Kim et al., 2022).
Auto-ICL generalizes this idea by having the model autonomously generate either demonstrations, instructions, or both, in a first stage and answer with them in a second stage (Yang et al., 2023). In generating mode, the reported average accuracy is 68.3all:\3, compared with 63.9 for Zero-Shot-CoT and 38.3 for Zero-Shot; in retrieving mode, Auto-ICL reaches 75.7, exceeding Few-Shot, APE, Instruction Induction, and Auto-CoT on the reported averages (Yang et al., 2023). The same study reports that instruction-only context is strongest in retrieving mode, whereas demonstration+instruction is best in generating mode (Yang et al., 2023).
ProGen adds feedback from a task-specific model. It iteratively grows a synthetic dataset, scores examples by a robust influence function using Reverse Cross-Entropy on a synthetic validation set, and feeds the top-PRESERVED_PLACEHOLDER_3all:\36 “helpful” examples back as in-context demonstrations for the next generation round (&&&3 OR title:\38&&&). On five text classification datasets, ProGen improves average zero-shot accuracy from 83 OR title:\3.94 to 86.53all:\3^ for DistilBERT and from 75.56 to 83query3.99 for an LSTM, and matches or exceeds ZeroGen with only 3all:\3% of its synthetic dataset size (&&&3 OR title:\38&&&).
A more explicit optimization of context appears in Li et al.’s two-stage framework for black-box LLMs. The method leaves the original prompt PRESERVED_PLACEHOLDER_3all:\37 intact, learns a policy PRESERVED_PLACEHOLDER_3all:\38 that generates a semantically aligned derived prompt PRESERVED_PLACEHOLDER_3all:\3query32^ queries a fixed response model PRESERVED_PLACEHOLDER_3 OR title:\3query3^ on PRESERVED_PLACEHOLDER_3 OR title:\3all:\3, and then wraps the PRESERVED_PLACEHOLDER_3 OR title:\3 OR title:\3^ pair into a single-shot in-context demonstration for the original prompt (&&&3all:\35&&&). Training maximizes
PRESERVED_PLACEHOLDER_3 OR title:\33^
implemented with a ReMax-style policy gradient while keeping the response model immutable (&&&3all:\35&&&). The inference template explicitly asks the model to “emulate” the way the derived-prompt response answers its question while replying to the original prompt, thereby anchoring the final answer in PRESERVED_PLACEHOLDER_3 OR title:\34 rather than replacing it (&&&3all:\35&&&). On Vicuna Eval with GPT-4, “OURS vs. Original Prompt” wins 93query3.3query3 of the time and “OURS vs. BPO” 88.8%; on Self-Instruct Eval with GPT-4, the corresponding win rates are 76.3 OR title:\3% and 73all:\3.4%; on GPT-3.5, the method maintains 73query3%+ win rate over the original prompt and 65%+ over BPO across benchmarks (&&&3all:\35&&&).
3. Structured-data and task-specific generation
In structured-data generation, in-context generation is used both as a substitute for fine-tuning and as a target for prompt optimization. TabGen-ICL formulates tabular synthesis with a fixed LLM and an iterative residual-aware selector: at iteration PRESERVED_PLACEHOLDER_3 OR title:\3sort_by32^ it chooses a subset
PRESERVED_PLACEHOLDER_3 OR title:\36
where PRESERVED_PLACEHOLDER_3 OR title:\37 is the set of generated rows so far and PRESERVED_PLACEHOLDER_3 OR title:\38 alternates between Jensen–Shannon divergence and Kolmogorov–Smirnov distance (Fang et al., 23 Feb 2025). The selected examples are JSON-serialized into the prompt, and the loop progressively narrows the gap between generated and real distributions. Across five real-world tables, TabGen-ICL reduces the error rate by 3.5%–43 OR title:\3.3 OR title:\3% on fidelity metrics relative to random selection (Fang et al., 23 Feb 2025).
The same dependence on context creates a fairness risk. In LLM-based tabular generation, few-shot prompts consist of PRESERVED_PLACEHOLDER_3 OR title:\39 demonstrations 3query3, and a bias parameter
3all:\3^
controls the label imbalance for a protected subgroup in those demonstrations (Recasens et al., 11 Jun 2025). The reported empirical result is that even mild in-context bias leads to global statistical distortion: as 3 OR title:\3^ increases, the generated 3 tracks it nearly linearly, the effect strengthens with larger context size, and all tested models leak in-context imbalances (Recasens et al., 11 Jun 2025). In the adversarial setting, an attacker controlling a fraction 4 of the in-context records can drive downstream fairness violations; at 5, a Random Forest trained on the synthetic data exhibits 6, while utility degrades only modestly and fidelity remains high, with TVC 7 and JSD 8 in the reported example (Recasens et al., 11 Jun 2025).
For machine translation in low-resource settings, Demonstration Augmentation for Translation (DAT) generates a candidate pool 9 of source-side examples, filters it with an MMR objective balancing relevance and diversity, generates target sides zero-shot, and then translates the 3query3^ with the resulting few-shot prompt (Lee et al., 31 May 2025). In the reported setup, 3query3, 3all:\3, and 3 OR title:\3^ (Lee et al., 31 May 2025). On English to Nepali, Khmer, Pashto, Zulu, and Swahili with Llama-3.3all:\3-73query3 DAT outperforms zero-shot in 4 out of 5 languages and avoids the severe backfire observed for fixed human pairs, including a 3 OR title:\3all:\3.6 COMET-point drop on Khmer for the few-shot fixed-pair baseline relative to zero-shot (Lee et al., 31 May 2025).
Task-specific generation also includes educational and commonsense settings. In automatic question generation from educational passages, GPT-4 with ICL and a Hybrid model combining ICL and retrieval both outperform baseline models; among automated metrics, ICL(3) obtains ROUGE-L 55.95, METEOR 34.63 OR title:\3, ChRF 63query3.48, and BERTScore 75.93 OR title:\3, while the Hybrid model is best on all reported human measures except Answerability (&&&43 OR title:\3&&&). In commonsense generation, a two-step diversification wrapper first produces default outputs, then asks the model to generate sentences different from its previous outputs when diversity is low; on CommonGen with GPT3.5-turbo, this raises the harmonic mean of diversity and BERTScore from 39.6 for default ICL to 73 OR title:\3.7 for the proposed ICD selection, while substantially lowering self-BLEU-4 from 73 OR title:\3.4 to 3 OR title:\3all:\3.3query3^ (Zhang et al., 2024).
4. Visual and multimodal in-context generation
In image generation, one influential design principle is to separate contextual appearance from structural control. Context Diffusion augments a latent diffusion UNet with a visual-context encoder, a frozen CLIP text encoder, a 3query3 encoder derived from ControlNet, and modified cross-attention that attends jointly to text embeddings 4 and visual-context embeddings 5 (&&&3 OR title:\3&&&). Prompt dropout replaces the text prompt with the empty string with probability 6, forcing the model to rely on visual context (&&&3 OR title:\3&&&). The reported user study shows especially large gains when only context is present: in-domain, the method wins 83query3.3 OR title:\3% versus 4.5% for Prompt Diffusion under context-only conditioning; out-of-domain, the corresponding figures are 63.7% versus 3 OR title:\3 OR title:\3.8% (&&&3 OR title:\3&&&).
X-Prompt extends in-context generation to a purely auto-regressive vision-LLM by compressing each in-context example into a small set of learnable “X-Prompt” tokens through cross-attention (Sun et al., 2024). If 7 is an example sequence and 8 are the learnable compression tokens, the model forces information flow through 9 and blocks direct attention from raw example tokens to target tokens (Sun et al., 2024). This makes the total context length
3query3^
with a practical window of up to 5,3all:\3 OR title:\3query3^ tokens (Sun et al., 2024). In zero-shot unseen tasks, the reported gains are large: low-light enhancement PSNR improves from 9.3all:\34 to 3all:\37.3query3query3 derain PSNR from 7.93 OR title:\3^ to 3all:\38.3all:\3query3 object addition 3all:\3^ from 3 OR title:\3^ to 3query3.3query3query3 OR title:\3, and unseen depth-color palette RMSE from 3query3.745 to 3query3.393query3^ (Sun et al., 2024).
Video In-context Learning applies the same principle to tokenized video. A decoder-only LLaMA-style Transformer is trained self-supervised on 3all:\36-frame clips tokenized by a pretrained VQ-GAN into 4,3query396 tokens plus bos/eos, with no explicit demo/3query3^ structure during training (Zhang et al., 2024). At inference, demonstration clips and 3query3^ frames are concatenated into one causal prefix, and the model autoregressively samples future frames (Zhang et al., 2024). With the 3all:\3.3all:\3 model, in-class demonstrations raise probing accuracy from 3 OR title:\39.6% to 36.7% and V-Acc by 3all:\3.8 points, while PSNR and FID improve with model scale (Zhang et al., 2024). The paper characterizes the resulting behavior as zero-shot imitation from demonstration videos (Zhang et al., 2024).
Latent-space flow and diffusion transformers now provide a unified setting for in-context image generation and editing. FLUX.3all:\3^ Kontext uses simple sequence concatenation of text and image latents, offsets the “time” coordinate of each context image in factorized 3D RoPE, and trains a rectified-flow transformer with a conditional flow-matching objective in latent space (&&&3all:\3 OR title:\3&&&). On KontextBench, a benchmark with 3all:\3,3query3 OR title:\36 image-prompt pairs across local editing, global editing, character reference, style reference, and text editing, FLUX.3all:\3^ Kontext[pro] and [max] rank at or near the top in human ELO evaluations; on five successive edits, AuraFace cosine similarity averages 3query3.93query3 for FLUX.3all:\3^ Kontext[pro], versus 3query3.774 for Runway Gen-4 and 3query3.43all:\3 for GPT-4o-High (&&&3all:\3 OR title:\3&&&). The reported inference time is 3–5 seconds for 3all:\3query3 OR title:\34×3all:\3query3 OR title:\34 images on a single A3all:\3query3query3^ GPU (&&&3all:\3 OR title:\3&&&).
5. Efficiency, token management, and forward-only adaptation
As contextual generation scales, sequence length becomes the main systems bottleneck. In Diffusion Transformers, in-context generation concatenates noisy latent tokens 3 with a fixed reference sequence 4, giving self-attention cost 5 (&&&3all:\3all:\3&&&). ToPi addresses this with training-free token pruning. It first computes a layerwise Context Sensitivity Score
6
on a calibration set, selects the top-7 representative layers, and then scores each context token by a value-weighted attention influence metric (&&&3all:\3all:\3&&&). Pruning is updated only at anchor timesteps through a fidelity-constrained objective that preserves at least a fraction 8 of total influence (&&&3all:\3all:\3&&&). On Flux.3all:\3-Kontext and Qwen-Image-Edit, ToPi yields about 3all:\3.3 OR title:\3all:\3×–3all:\3.33× speedup, recovers within 9 dB of full-context PSNR, adds 3query3^ latency overhead, and removes over 53query3% of reference tokens on average while preserving at least 85% of the context “information mass” (&&&3all:\3all:\3&&&).
Instructional image editing exposes a parallel efficiency–precision problem. In-Context Edit treats a pretrained DiT inpainting model as a black box by forming a side-by-side in-context image in which the source image occupies the left half and the target half is masked; the associated IC prompt describes the original image on the left and the instructed edit on the right (&&&3all:\3query3&&&). A training-free version already benefits from the IC prompt alone, improving CLIP-I from 3query3.683all:\3^ to 3query3.794 and GPT from 3query3.3all:\3 to 3query3.3 OR title:\34 in the reported ablation (&&&3all:\3query3&&&). The paper then adds a LoRA-MoE hybrid, in which the output of the frozen base layer is augmented by a sparse mixture of low-rank experts, and an early-filter inference-time scaling method that scores partial denoising trajectories with Qwen-VL-73 OR title:\3B (&&&3all:\3query3&&&). The early filter improves SC by 3all:\39% and overall VIE-Score by 3all:\36% over single-seed outputs (&&&3all:\3query3&&&).
Forward-only adaptation also appears in retrieval-augmented generation. RAG-GD keeps the retriever and LLM backbone frozen, trains a base retrieval adapter 3all:\3, and then meta-trains a predictor 3 OR title:\3^ that maps a few-shot RAG support set to low-rank updates approximating what 3 steps of SGD would have done to the retrieval interface (&&&3all:\33&&&). At inference, the update is produced in one small forward pass rather than test-time backpropagation (&&&3all:\33&&&). On Qwen 3 OR title:\3.5 B with E5 retrieval, the reported average EM/F3all:\3^ improves from 34.3all:\36/43 OR title:\3.54 to 36.73all:\3/45.3all:\3all:\3 and the method approaches test-time gradient adaptation at much lower per-3query3^ cost (&&&3all:\33&&&).
6. Failure modes, safeguards, and open problems
A recurring concern is that contextual generation can improve surface quality while drifting semantically or statistically. Li et al.’s derived-prompt framework addresses semantic drift with two explicit safeguards: a KL penalty keeps the learned derived-prompt policy close to its reference initialization, and the final inference template always asks the model to answer the original prompt rather than the derived one (&&&3all:\35&&&). In tabular generation, by contrast, the few-shot examples themselves may be the attack surface; prompt audit, balanced prompt design, fairness-guided exemplar selection, post-generation debiasing, and model-internal defenses are proposed as mitigation strategies for in-context bias propagation (Recasens et al., 11 Jun 2025).
Several modality-specific limitations remain. Context Diffusion reports that very fine-grained local edits can still fail and that, when visual context and text disagree, the model tends to favor the context image (&&&3 OR title:\3&&&). X-Prompt notes that the base Chameleon VQ-VAE compresses at 3all:\36× and loses fine detail, and that generalization degrades across completely unrelated tasks (Sun et al., 2024). FLUX.3all:\3^ Kontext reports minor artifact accumulation and occasional instruction non-compliance after 6–7 edits in multi-turn workflows (&&&3all:\3 OR title:\3&&&). In machine translation, progressive accumulation of synthetic demonstrations improves retrieval-based reuse but does not fully match dynamic on-the-fly generation (Lee et al., 31 May 2025). In educational question generation, example selection remains sensitive, and retrieved passages may be semantically related yet contextually irrelevant (&&&43 OR title:\3&&&).
The broader research agenda remains explicitly open. The data-generation survey identifies the mechanistic origin of skill learning, the causal linkage between the pre-training function class and learnable in-context functions, the extension of the framework to chain-of-thought reasoning and self-critique, and unified probabilistic frameworks that cover both recognition and learning as central future directions (&&&3all:\3&&&). A plausible implication is that progress in in-context generation will depend less on any single prompting heuristic than on a joint theory of context selection, context compression, implicit optimization, and failure analysis across modalities.