- The paper presents a novel approach by recasting context engineering as a recommendation problem using instance-specific neural collaborative filtering.
- It employs a three-stage methodology—cluster-based initialization, co-evolution of context catalogs, and instance-wise routing—to enhance LLM performance.
- Empirical evaluations on HoVer, SCONE, and HotpotQA show significant accuracy boosts and sample efficiency over conventional global strategies.
Neural Collaborative Context Engineering: Instance-wise Context Routing for LLMs
LLMs exhibit pronounced sensitivity to input context configurations, affecting downstream reasoning and QA performance. Conventional automatic prompt engineering systems optimize a single global context strategy, assuming homogeneity across input instances. However, empirical observations and theoretical analysis indicate substantial unexploited gains at the instance level due to context diversity requirements. This work establishes a paradigm shift by recasting context engineering as a recommendation problem: input instances act as users, candidate context strategies as items, and task accuracy as the interaction signal. The principal aim is to infer latent preference structure from sparse evaluations and activate dynamic, instance-level routing—a fundamentally inductive collaborative filtering scenario.
Methodology: Neural Collaborative Context Engineering
NCCE operates in three critical stages:
- Cluster-based Initialization: Training instances are embedded semantically and clustered via KMeans; a warm-up optimizer (e.g., MIPROv2) generates cluster-level anchor contexts, yielding a diverse catalog of candidate strategies. This initialization reduces semantic diameter within clusters, providing a low-regret starting point for preference induction.
- Context-CF Co-Evolution: A lightweight Neural Collaborative Filtering (NCF) model is trained on observed instance-context reward triples, using pairwise ranking loss to align with relative preference objectives. The NCF's latent gradients identify catalog blind spots—failure instances not solved by any current context—then guide gradient ascent over context embeddings. The resulting continuous targets are discretized, and an LLM-based reflector proposes improved textual context variants. Each new variant augments the catalog and expands the interaction matrix, forming a synergistic feedback loop between preference model and catalog evolution.
- Instance-wise Context Routing: At inference, the trained NCF router scores all catalog strategies for each unseen input, selecting the strategy with maximum predicted compatibility. This enables efficient, personalized context construction and obviates the need for costly global reevaluation.
Figure 1: The overall architecture of NCCE, featuring synergistic co-evolution between a neural collaborative filtering model and evolving context catalog for personalized context routing.
Theoretical Guarantees
The regret bound for instance-wise routing decomposes into two orthogonal terms: (I) catalog coverage, controlled by cluster size and anchor quality (α+LρK), and (II) router generalization, governed by interaction density and NCF capacity. Increasing cluster numbers K reduces semantic diameter but risks anchor over-specialization. The PAC bound formally proves that dynamic, instance-wise routing systematically outperforms static global optimization when input-context preference relations obey cluster Lipschitz continuity.
Experimental Evaluation
NCCE is benchmarked on HoVer, SCONE, and HotpotQA, contrasting its routing-based approach with global optimization methods (APE, OPRO, EvoPrompt, TextGrad, MIPROv2, GEPA). Task accuracy is the central metric. NCCE achieves 74.8% test accuracy—an absolute margin of 6%+ over best global baselines (MIPROv2, GEPA-Merge)—and sets new records across all datasets (HoVer: 74.7%, SCONE: 89.7%, HotpotQA: 60.1%). Statistical significance is verified (p<0.05).
Figure 2: Performance evolution across rounds, showing sustained task score improvement for full NCCE; ablations plateau, indicating necessity of co-evolution.
Ablation analyses reveal that context catalog expansion without intelligent routing (random routing) actively degrades performance; removing co-evolution or switching to cluster-only routing further reduces gains. Pointwise regression loss underperforms pairwise ranking. Oracle routing (using ground-truth optimal strategy for each instance) yields 84.3%—demonstrating catalog quality and remaining headroom for router improvements.

Figure 3: Performance dependence on cluster count, validating theoretical trade-off in anchor specialization and semantic coverage.
Routing Analysis and Representation Dynamics
t-SNE projections of context assignment demonstrate that NCCE's router produces high assignment entropy and spatial mixing, overcoming rigid cluster boundaries and successfully capturing nuanced compatibility signals. Shannon entropy increases consistently relative to cluster-only routing, confirming the router's exploitation of latent structure.
Figure 4: t-SNE visualization of context routing assignments. Colors encode context strategies, indicating heterogeneous, non-clustered routing decisions.
Data density experiments indicate that NCCE is highly sample-efficient: populating just 30% of the instance-context interaction matrix suffices for near-peak test accuracy, implying strong inductive generalization via semantic features.
Practical and Theoretical Implications
The results establish that context engineering is fundamentally a recommendation problem in the presence of input heterogeneity; global strategies are parametrically suboptimal under realistic distributions. NCCE's framework is modular and admits extension: catalog expansion, preference model sophistication (e.g., GNNs or transformer-based routers), and cross-model transfer are natural avenues. Theoretical analyses highlight the regime where catalog diversity and router expressiveness must be balanced—this can inform new benchmarks and dataset design for future adaptive LLM inference.
Practically, NCCE enables highly efficient, adaptive routing workflows for LLMs, minimizing API cost and computational burden at inference. The framework can be deployed for automated QA, reasoning, and reasoning-centric retrieval tasks, and can be integrated into downstream production pipelines.
Conclusion
NCCE demonstrates that dynamic, inductive context routing via neural collaborative filtering and synergistic catalog co-evolution unlocks previously untapped gains in LLM task performance. The design is theoretically sound, empirically validated, and extensible. This paradigm positions context engineering as an instance-level matching problem, establishing a new direction for adaptive prompt optimization and scalable LLM deployment (2605.15721).