Principled selection of episode time limits for fragment-based RL molecule generation
Determine a principled method for choosing the episode time limit (i.e., the number of assembling steps T) in the FREED++ fragment-based reinforcement learning framework for molecule generation, accounting for variability in the number of fragments across target proteins and mitigating state aliasing; evaluate alternatives such as including the remaining time as part of the state input (as in MolDQN) or learning a termination policy via an auxiliary neural network conditioned on the current extended molecular graph representation.
Sponsor
References
One possible solution is to include a notion of the remaining time as part of the agent’s input ~\citep{pardo2018time}, which was utilized in MolDQN; however, it is still unclear how to select a time limit. A more promising solution would be the controllable generation termination when the termination signal is predicted with an auxiliary neural network based on the current state representation, similar to the one used in GCPN and Pocket2Mol. We leave this for future work.