Cross-domain generalization of symbolic surrogates for LLM MLP layers

Determine how well symbolic surrogate models that replace transformer MLP layers—constructed via SymTorch with PCA-based dimensionality reduction and PySR-fitted analytic expressions—generalize when evaluated on distributions different from their training distribution, and ascertain whether domain-agnostic symbolic approximations are feasible or whether task- and domain-specific surrogates are necessary.

Background

The paper proposes accelerating transformer inference by replacing selected MLP layers with symbolic surrogates fitted using SymTorch and PySR, after PCA-based dimensionality reduction of inputs and outputs. Experiments are conducted on Qwen2.5-1.5B-Instruct using WikiText-2, with training and evaluation on the same distribution.

The authors explicitly acknowledge that the current evaluation does not test out-of-domain robustness and raise a specific open question regarding whether these symbolic surrogates generalize across domains, as well as whether domain-agnostic surrogates are achievable or if task- and domain-specific surrogates are required.

References

Cross-Domain Generalization: The symbolic surrogates for LLM components are trained and evaluated on the same distribution, leaving open the question of how well such surrogates generalize across domains. We hope to determine whether domain-agnostic symbolic approximations are feasible, or whether task- and domain-specific surrogates are necessary.

SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks  (2602.21307 - Tan et al., 24 Feb 2026) in Discussion → Limitations and Future Work, bullet “Cross-Domain Generalization”