Principled selection of episode time limits for fragment-based RL molecule generation

Determine a principled method for choosing the episode time limit (i.e., the number of assembling steps T) in the FREED++ fragment-based reinforcement learning framework for molecule generation, accounting for variability in the number of fragments across target proteins and mitigating state aliasing; evaluate alternatives such as including the remaining time as part of the state input (as in MolDQN) or learning a termination policy via an auxiliary neural network conditioned on the current extended molecular graph representation.

Background

FREED++ formulates fragment-based molecule generation as a time-limited RL task, typically using a fixed episode length of four assembling steps starting from a benzene seed. States are extended molecular graphs without an explicit notion of time, and rewards are assigned after docking at episode termination.

The authors highlight two drawbacks of this formulation: (1) selecting an appropriate episode length is nontrivial, as real molecules vary widely in their number of fragments depending on the target protein; and (2) fixed time limits cause state aliasing, since identical partial molecules can occur at different remaining times, which impairs accurate value estimation.

They note one possible mitigation—adding remaining time as part of the agent’s input (as used in MolDQN)—but emphasize that choosing the time limit remains unresolved. They suggest a potentially better approach of learning controllable generation termination via an auxiliary neural network that predicts when to stop based on the current state, and explicitly defer this to future work.

References

One possible solution is to include a notion of the remaining time as part of the agent’s input ~\citep{pardo2018time}, which was utilized in MolDQN; however, it is still unclear how to select a time limit. A more promising solution would be the controllable generation termination when the termination signal is predicted with an auxiliary neural network based on the current state representation, similar to the one used in GCPN and Pocket2Mol. We leave this for future work.

FREED++: Improving RL Agents for Fragment-Based Molecule Generation by Thorough Reproduction (2401.09840 - Telepov et al., 18 Jan 2024) in Appendix, Section "Time Limits" (Section \ref{app:timelimits})