Burn-in improvement for domain randomization with confidence-ellipsoid sampling
Establish whether domain randomization that samples uniformly over the least-squares confidence ellipsoid achieves a smaller burn-in time than certainty equivalence for learning the linear quadratic regulator, by proving that the domain-randomized controller’s cost remains suitably bounded near the true parameter even when the sampling distribution has large support.
References
This design raises the hope that domain randomization could reduce the burn-in time. However, we have not been able to prove this property, as we cannot exclude the possibility that for distributions with large support, the domain-randomized controller might incur very high costs near θ⋆ while performing well elsewhere.
— Domain Randomization is Sample Efficient for Linear Quadratic Control
(2502.12310 - Fujinami et al., 17 Feb 2025) in Section 3.1 Sample Efficiency of Domain Randomization