Source of representation gains in C51: distributional backups vs. cross-entropy loss
Determine whether the representation improvements observed with the C51 categorical distributional reinforcement learning algorithm arise primarily from modeling the distribution of returns through distributional Bellman backups or from training value functions using a categorical cross-entropy loss.
References
Lyle et al. (2019) showed that gains from C51 can be partially attributed to improved representations but it remains unknown whether they stem from backing up distributions of returns or the use of cross-entropy loss.
— Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
(2403.03950 - Farebrother et al., 2024) in Section “Does Classification Learn More Expressive Representations?” (sec:repr_analysis)