Q-MetaSur: LLM Meta-Surrogate for MTMOO
- Q-MetaSur is a meta-surrogate that leverages a sequence-to-sequence transformer to translate tokenized task metadata and decision variables into accurate objective predictions for MTMOO.
- It employs a two-stage training strategy, combining supervised fine-tuning with reinforcement learning fine-tuning and conservative Q-learning to enhance surrogate accuracy and generalization.
- Extensive benchmarks on CEC2019 demonstrate that Q-MetaSur significantly improves both surrogate error metrics and overall optimization performance compared to traditional baselines.
Q-MetaSur is a LLM–based meta-surrogate designed to provide unified and high-fidelity surrogate modeling for offline multi-task multi-objective optimization (MTMOO), particularly in settings where direct evaluation of expensive objective functions is prohibitive. Leveraging a sequence-to-sequence approach with tailored tokenization, Q-MetaSur generalizes across heterogeneous tasks and objectives by encoding both decision variables and task metadata as token streams and using an LLM to autoregressively predict objective values. A two-stage offline training scheme—combining supervised fine-tuning and implicit Q-learning with conservative regularization—yields robust generalization. Extensive benchmarks on CEC2019 MTMOO demonstrate significant improvements in both surrogate errors and resultant optimization performance compared to established baselines (Zhang et al., 17 Dec 2025).
1. Problem Setting and Design Rationale
Q-MetaSur addresses the scenario of expensive offline MTMOO, where one must solve distinct but related multi-objective optimization problems:
with only fixed, finite datasets for each task. The limitations of classical surrogate models such as per-objective GPs—namely fragmentation, cubic scaling with respect to and , and poor handling of variable input/output dimensions—motivate a single-sequence, autoregressive surrogate.
Q-MetaSur recasts surrogate modeling as a sequence-to-sequence learning problem, using a LLM to “translate” a concatenated, tokenized representation of task metadata and decision variables to corresponding objective values. This approach leverages the expressivity of LLMs and enables joint modeling of task/task relationships and heterogeneous objectives.
2. Unified Tokenization and Model Architecture
Textual Tokenization
Q-MetaSur utilizes a unified textual format:
- Each instance is described as a token stream including:
- Task metadata (e.g., function name, dimensionality, instance index)
- Decision vector and objective vector for each task
- All scalars are encoded in scientific notation:
and tokenized as:
- The tokenizer maps to source and target token sequences:
Encoder–Decoder Model
- Backbone: T5 Transformer (t5-small or t5-base), usually 12 encoder/decoder layers, model dimension of 512 or 768, and 8/12 self-attention heads
- Input processing: Source tokens are embedded and summed with positional encodings, then passed to the encoder; the decoder autoregressively predicts objective tokens with teacher forcing
- Output: The probability of target sequence given factorizes as:
3. Two-Stage Offline Training: SFT and RLFT
Stage 1: Supervised Fine-Tuning (SFT)
- Minimize a priority-weighted cross-entropy loss emphasizing sign and exponent tokens:
with decay weights .
Stage 2: Reinforcement Learning Fine-Tuning (RLFT)
- Treats output decoding as a POMDP; final sequence-level reward is exponential-RMSE with exponent–sign–matching bonuses
- Data augmentation: Add noise to ground-truth to expand reward coverage
- Attach two Q-heads and a V-head to the decoder; minimize mixed Bellman and expectile loss:
- Add a conservative Q-learning regularizer:
- Joint RLFT objective:
- At inference, logits can be adjusted using advantage .
4. Integration into Evolutionary Multi-Task Optimization
Q-MetaSur is integrated in plug-and-play fashion into any MTMOO evolutionary framework by replacing expensive calls to with its learned surrogate:
where produces autoregressive, tokenized predictions. Standard selection, crossover, and transfer processes (e.g., NSGA-II tournament, crowding-distance, adaptive transfer) are performed as before, using the surrogate-predicted objectives (Zhang et al., 17 Dec 2025).
5. Empirical Benchmarking on CEC2019 MTMOO
Extensive experiments on CEC2019 MTMOO assess both surrogate accuracy and downstream optimization.
Surrogate Accuracy
- Compared to state-of-the-art surrogates—single-task RBFN, Kolmogorov–Arnold Networks (KAN), and FTGP (ExTrEMO transfer GP)—Q-MetaSur achieves lowest standardized MAE (sMAE) on 10/12 instance–objective pairs and top in 9/12. On several instances (Inst2/Obj0, Inst3/Obj0, Inst5/Obj0), it attains sMAE and .
| Instance | RBFN Obj0 | RBFN Obj1 | KAN Obj0 | KAN Obj1 | FTGP Obj0 | FTGP Obj1 | Q-MetaSur Obj0 | Q-MetaSur Obj1 |
|---|---|---|---|---|---|---|---|---|
| Inst1 | 0.2064 | 0.2026 | 0.4582 | 0.4357 | 0.2077 | 0.2023 | 0.0693 | 0.0655 |
| Inst2 | 0.2615 | 0.1083 | 1.8741 | 1.0453 | 0.2510 | 0.1494 | 0.0000 | 0.1529 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
Optimization Performance
- Under a strict 200-function-evaluations (FE) regime, Q-MetaSur embedded within both MO-MaTDE and MTEA-DCK achieves the best mean standardized IGD score (MSS; lower is better) in 11/12 instance–backbone cases, significantly outperforming both the REAL-fitness (no surrogate) and other surrogate-assisted variants.
| Setting | Q-MetaSur MSS | REAL MSS | RBFN MSS | FTGP MSS | KAN MSS |
|---|---|---|---|---|---|
| ins1_MO-MaTDE | -0.777 ± 0.065 | -0.153 ± 0.041 (+) | -0.645 ± 0.064 (+) | 0.852 ± 0.105 (+) | 1.031 ± 0.265 (+) |
| ins6_MTEA-DCK | -1.034 ± 0.018 | -0.633 ± 0.052 (+) | 0.748 ± 0.032 (+) | -0.405 ± 0.063 (+) | 0.610 ± 0.080 (+) |
“” denotes Q-MetaSur is significantly better (Wilcoxon, ).
Final Pareto fronts confirm that the surrogate-assisted optimization with Q-MetaSur closely aligns with the true Pareto front, whereas alternative surrogates and REAL-fitness lag behind.
6. Key Findings, Limitations, and Outlook
- Approximation accuracy: Q-MetaSur reduces sMAE by up to 100% on nearly-deterministic objectives and decreases unexplained variance () by 30–70% relative to single-task surrogates.
- Optimization improvements: Under strict FE constraints and in fully offline mode, Q-MetaSur delivers statistically significant gains on virtually all task–algorithm pairs tested.
- Generalization: Shows stable zero-shot performance on unseen tasks and moderate unseen dimensions with sMAE and .
- Limitations: (i) Degraded stability for decision-space dimensions far exceeding the training range; (ii) surrogate quality depends on metadata completeness and reward function design; (iii) longer sequences incur additional latency.
- Future directions: Pursue few-shot online calibration to extend applicability, explore structured numeric prediction heads and uncertainty-aware decoding (e.g., conformal intervals), and investigate lightweight adapters or mixture-of-experts to better capture localized objective landscapes.
Q-MetaSur demonstrates that, with an appropriately-designed tokenization scheme, explicit loss–metric alignment via offline RL, and conservative Q-learning regularization, a single LLM-based surrogate can provide scalable, high-fidelity approximation over diverse and challenging offline MTMOO settings (Zhang et al., 17 Dec 2025).