Burn-in improvement for domain randomization with confidence-ellipsoid sampling

Establish whether domain randomization that samples uniformly over the least-squares confidence ellipsoid achieves a smaller burn-in time than certainty equivalence for learning the linear quadratic regulator, by proving that the domain-randomized controller’s cost remains suitably bounded near the true parameter even when the sampling distribution has large support.

Background

The paper designs DR to sample over the confidence ellipsoid built from the least-squares estimate and estimated Fisher information, ensuring high-probability coverage of the true system. This suggests the possibility of reduced burn-in due to averaging over plausible models.

However, the authors state they were unable to prove such a reduction because large-support distributions might allow controllers that perform well on average yet incur high costs near the true parameter. Formalizing conditions under which DR guarantees improved burn-in remains open.

References

This design raises the hope that domain randomization could reduce the burn-in time. However, we have not been able to prove this property, as we cannot exclude the possibility that for distributions with large support, the domain-randomized controller might incur very high costs near θ⋆ while performing well elsewhere.

Domain Randomization is Sample Efficient for Linear Quadratic Control (2502.12310 - Fujinami et al., 17 Feb 2025) in Section 3.1 Sample Efficiency of Domain Randomization