Abstract:Recent developments in artificial intelligence and automation could potentially enable a new drug design paradigm: autonomous drug design. Under this paradigm, generative models provide suggestions on thousands of molecules with specific properties. However, since only a limited number of molecules can be synthesized and tested, an obvious challenge is how to efficiently select these. We formulate this task as a contextual stochastic multi-armed bandit problem with multiple plays and volatile arms. Then, to solve it, we extend previous work on multi-armed bandits to reflect this setting, and compare our solution with random sampling, greedy selection and decaying-epsilon-greedy selection. To investigate how the different selection strategies affect the cumulative reward and the diversity of the selections, we simulate the drug design process. According to the simulation results, our approach has the potential for better exploring and exploiting the chemical space for autonomous drug design.