Does matched task performance imply similar parameter-space solutions for ES vs RL-based LLM fine-tuning?
Determine whether achieving comparable task performance when fine-tuning large language models with Evolution Strategies, in comparison to reinforcement learning–based fine-tuning methods such as Group Relative Policy Optimization, implies that the resulting parameter vectors are comparable in parameter space (for example, occupying similar regions or exhibiting similar geometric properties).
References
Evolution Strategies (ES) have emerged as a scalable gradient-free alternative to reinforcement learning based LLM fine-tuning, but it remains unclear whether comparable task performance implies comparable solutions in parameter space.
— Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training
(2604.01499 - Hoy et al., 2 Apr 2026) in Abstract