Dice Question Streamline Icon: https://streamlinehq.com

Burn-in improvement for domain randomization with confidence-ellipsoid sampling

Establish whether domain randomization that samples uniformly over the least-squares confidence ellipsoid achieves a smaller burn-in time than certainty equivalence for learning the linear quadratic regulator, by proving that the domain-randomized controller’s cost remains suitably bounded near the true parameter even when the sampling distribution has large support.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper designs DR to sample over the confidence ellipsoid built from the least-squares estimate and estimated Fisher information, ensuring high-probability coverage of the true system. This suggests the possibility of reduced burn-in due to averaging over plausible models.

However, the authors state they were unable to prove such a reduction because large-support distributions might allow controllers that perform well on average yet incur high costs near the true parameter. Formalizing conditions under which DR guarantees improved burn-in remains open.

References

This design raises the hope that domain randomization could reduce the burn-in time. However, we have not been able to prove this property, as we cannot exclude the possibility that for distributions with large support, the domain-randomized controller might incur very high costs near θ⋆ while performing well elsewhere.

Domain Randomization is Sample Efficient for Linear Quadratic Control (2502.12310 - Fujinami et al., 17 Feb 2025) in Section 3.1 Sample Efficiency of Domain Randomization