Papers
Topics
Authors
Recent
2000 character limit reached

Q-MetaSur: LLM Meta-Surrogate for MTMOO

Updated 24 December 2025
  • Q-MetaSur is a meta-surrogate that leverages a sequence-to-sequence transformer to translate tokenized task metadata and decision variables into accurate objective predictions for MTMOO.
  • It employs a two-stage training strategy, combining supervised fine-tuning with reinforcement learning fine-tuning and conservative Q-learning to enhance surrogate accuracy and generalization.
  • Extensive benchmarks on CEC2019 demonstrate that Q-MetaSur significantly improves both surrogate error metrics and overall optimization performance compared to traditional baselines.

Q-MetaSur is a LLM–based meta-surrogate designed to provide unified and high-fidelity surrogate modeling for offline multi-task multi-objective optimization (MTMOO), particularly in settings where direct evaluation of expensive objective functions is prohibitive. Leveraging a sequence-to-sequence approach with tailored tokenization, Q-MetaSur generalizes across heterogeneous tasks and objectives by encoding both decision variables and task metadata as token streams and using an LLM to autoregressively predict objective values. A two-stage offline training scheme—combining supervised fine-tuning and implicit Q-learning with conservative regularization—yields robust generalization. Extensive benchmarks on CEC2019 MTMOO demonstrate significant improvements in both surrogate errors and resultant optimization performance compared to established baselines (Zhang et al., 17 Dec 2025).

1. Problem Setting and Design Rationale

Q-MetaSur addresses the scenario of expensive offline MTMOO, where one must solve TT distinct but related multi-objective optimization problems:

{Tt}t=1T,Tt=(Xt,Ft,Mt,Dt)\{\mathcal{T}_t\}_{t=1}^T,\quad \mathcal{T}_t = (\mathcal{X}_t, F_t, \mathcal{M}_t, \mathcal{D}_t)

with only fixed, finite datasets Dt={(xi(t),Ft(xi(t)))}i=1Nt\mathcal{D}_t = \{(x_i^{(t)}, F_t(x_i^{(t)}))\}_{i=1}^{N_t} for each task. The limitations of classical surrogate models such as per-objective GPs—namely fragmentation, cubic scaling with respect to TT and NtN_t, and poor handling of variable input/output dimensions—motivate a single-sequence, autoregressive surrogate.

Q-MetaSur recasts surrogate modeling as a sequence-to-sequence learning problem, using a LLM to “translate” a concatenated, tokenized representation of task metadata and decision variables to corresponding objective values. This approach leverages the expressivity of LLMs and enables joint modeling of task/task relationships and heterogeneous objectives.

2. Unified Tokenization and Model Architecture

Textual Tokenization

Q-MetaSur utilizes a unified textual format:

  • Each instance is described as a token stream including:
    • Task metadata (e.g., function name, dimensionality, instance index)
    • Decision vector xRntx\in\mathbb{R}^{n_t} and objective vector yRkt\mathbf{y}\in\mathbb{R}^{k_t} for each task tt
  • All scalars zz are encoded in scientific notation:

z=sign(z)m10κ,m[1,10),  κ=log10zz = \mathrm{sign}(z) \cdot m \cdot 10^\kappa,\quad m\in[1,10),\; \kappa=\lfloor \log_{10}|z| \rfloor

and tokenized as:

ϕ(z)=[±]  10κ  d1d2dndigit\phi(z) = [\pm]\;\langle10^\kappa\rangle\; d_1 d_2 \cdots d_{n_\text{digit}}

  • The tokenizer τ\tau maps (mt,x,y)(m_t, x, \mathbf{y}) to source and target token sequences:

ZO=(τ(mt,x),  τ(y))V×VZ \Vert O = (\tau(m_t, x),\; \tau(\mathbf{y})) \in \mathcal{V}^* \times \mathcal{V}^*

Encoder–Decoder Model

  • Backbone: T5 Transformer (t5-small or t5-base), usually 12 encoder/decoder layers, model dimension dmd_m of 512 or 768, and 8/12 self-attention heads
  • Input processing: Source tokens Z=(z1,,znsrc)Z=(z_1,\dots,z_{n_\mathrm{src}}) are embedded and summed with positional encodings, then passed to the encoder; the decoder autoregressively predicts objective tokens with teacher forcing
  • Output: The probability of target sequence O=(o1,,ontgt)O = (o_1, \ldots, o_{n_\mathrm{tgt}}) given ZZ factorizes as:

Pθ(OZ)=i=1ntgtPθ(oio<i,  Encoder(X))P_\theta(O \mid Z) = \prod_{i=1}^{n_\mathrm{tgt}} P_\theta\big( o_i \mid o_{<i},\; \mathrm{Encoder}(X) \big)

3. Two-Stage Offline Training: SFT and RLFT

Stage 1: Supervised Fine-Tuning (SFT)

  • Minimize a priority-weighted cross-entropy loss emphasizing sign and exponent tokens:

Lsup=j=1kt=1Ljωj,logPθ(oιj,o<ιj,,Encoder(X))\mathcal{L}_\text{sup} = -\sum_{j=1}^{k_t} \sum_{\ell=1}^{L_j} \omega_{j,\ell} \log P_\theta\big(o_{\iota_{j,\ell}} \mid o_{<\iota_{j,\ell}}, \text{Encoder}(X)\big)

with decay weights ωj,\omega_{j,\ell}.

Stage 2: Reinforcement Learning Fine-Tuning (RLFT)

  • Treats output decoding as a POMDP; final sequence-level reward R(y^,y)R(\hat{\mathbf{y}}, \mathbf{y}) is exponential-RMSE with exponent–sign–matching bonuses
  • Data augmentation: Add noise to ground-truth y\mathbf{y} to expand reward coverage
  • Attach two Q-heads and a V-head to the decoder; minimize mixed Bellman and expectile loss:

LQ,V=Eτ[i=1L(R+γVθ(hi+1)Qθ(hi,ai))2+ET(Qθ^(hi,ai)Vθ(hi))]\mathcal{L}_{Q,V} = \mathbb{E}_{\tau} \left[\sum_{i=1}^L (R + \gamma V_\theta(h_{i+1}) - Q_\theta(h_i, a_i))^2 + \mathrm{ET}(Q_{\hat{\theta}}(h_i, a_i) - V_\theta(h_i)) \right]

  • Add a conservative Q-learning regularizer:

LCQL=E(s,a)CE(softmax(Qθ(s,)),a)\mathcal{L}_\text{CQL} = \mathbb{E}_{(s, a)} \mathrm{CE}\left(\mathrm{softmax}(Q_\theta(s, \cdot)), a\right)

  • Joint RLFT objective:

L=LQ,V+λCQLLCQL+Lsup\mathcal{L} = \mathcal{L}_{Q,V} + \lambda_\text{CQL} \mathcal{L}_\text{CQL} + \mathcal{L}_\text{sup}

  • At inference, logits can be adjusted using advantage (QθVθ)(Q_\theta - V_\theta).

4. Integration into Evolutionary Multi-Task Optimization

Q-MetaSur is integrated in plug-and-play fashion into any MTMOO evolutionary framework by replacing expensive calls to FtF_t with its learned surrogate:

u.ObjF^MetaSur(mt,u.Dec)u.\mathrm{Obj} \gets \hat{F}_\mathrm{MetaSur}(m_t, u.\mathrm{Dec})

where F^MetaSur\hat{F}_\mathrm{MetaSur} produces autoregressive, tokenized predictions. Standard selection, crossover, and transfer processes (e.g., NSGA-II tournament, crowding-distance, adaptive transfer) are performed as before, using the surrogate-predicted objectives (Zhang et al., 17 Dec 2025).

5. Empirical Benchmarking on CEC2019 MTMOO

Extensive experiments on CEC2019 MTMOO assess both surrogate accuracy and downstream optimization.

Surrogate Accuracy

  • Compared to state-of-the-art surrogates—single-task RBFN, Kolmogorov–Arnold Networks (KAN), and FTGP (ExTrEMO transfer GP)—Q-MetaSur achieves lowest standardized MAE (sMAE) on 10/12 instance–objective pairs and top R2R^2 in 9/12. On several instances (Inst2/Obj0, Inst3/Obj0, Inst5/Obj0), it attains sMAE =0=0 and R2=1.000R^2=1.000.
Instance RBFN Obj0 RBFN Obj1 KAN Obj0 KAN Obj1 FTGP Obj0 FTGP Obj1 Q-MetaSur Obj0 Q-MetaSur Obj1
Inst1 0.2064 0.2026 0.4582 0.4357 0.2077 0.2023 0.0693 0.0655
Inst2 0.2615 0.1083 1.8741 1.0453 0.2510 0.1494 0.0000 0.1529
... ... ... ... ... ... ... ... ...

Optimization Performance

  • Under a strict 200-function-evaluations (FE) regime, Q-MetaSur embedded within both MO-MaTDE and MTEA-DCK achieves the best mean standardized IGD score (MSS; lower is better) in 11/12 instance–backbone cases, significantly outperforming both the REAL-fitness (no surrogate) and other surrogate-assisted variants.
Setting Q-MetaSur MSS REAL MSS RBFN MSS FTGP MSS KAN MSS
ins1_MO-MaTDE -0.777 ± 0.065 -0.153 ± 0.041 (+) -0.645 ± 0.064 (+) 0.852 ± 0.105 (+) 1.031 ± 0.265 (+)
ins6_MTEA-DCK -1.034 ± 0.018 -0.633 ± 0.052 (+) 0.748 ± 0.032 (+) -0.405 ± 0.063 (+) 0.610 ± 0.080 (+)

(+)(+)” denotes Q-MetaSur is significantly better (Wilcoxon, α=0.05\alpha=0.05).

Final Pareto fronts confirm that the surrogate-assisted optimization with Q-MetaSur closely aligns with the true Pareto front, whereas alternative surrogates and REAL-fitness lag behind.

6. Key Findings, Limitations, and Outlook

  • Approximation accuracy: Q-MetaSur reduces sMAE by up to 100% on nearly-deterministic objectives and decreases unexplained variance (1R21-R^2) by 30–70% relative to single-task surrogates.
  • Optimization improvements: Under strict FE constraints and in fully offline mode, Q-MetaSur delivers statistically significant gains on virtually all task–algorithm pairs tested.
  • Generalization: Shows stable zero-shot performance on unseen tasks and moderate unseen dimensions with sMAE 0.07\leq 0.07 and R2>0R^2 > 0.
  • Limitations: (i) Degraded stability for decision-space dimensions far exceeding the training range; (ii) surrogate quality depends on metadata completeness and reward function design; (iii) longer sequences incur additional latency.
  • Future directions: Pursue few-shot online calibration to extend applicability, explore structured numeric prediction heads and uncertainty-aware decoding (e.g., conformal intervals), and investigate lightweight adapters or mixture-of-experts to better capture localized objective landscapes.

Q-MetaSur demonstrates that, with an appropriately-designed tokenization scheme, explicit loss–metric alignment via offline RL, and conservative Q-learning regularization, a single LLM-based surrogate can provide scalable, high-fidelity approximation over diverse and challenging offline MTMOO settings (Zhang et al., 17 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Q-MetaSur.