Reinforcement learning algorithms are gaining popularity in fields where optimal scheduling is important, and oncology is not an exception. The complex and uncertain dynamics of cancer limit the performance of traditional model-based scheduling strategies like Optimal Control. Motivated by the recent success of model-free Deep Reinforcement Learning (DRL) in challenging control tasks and in medical treatments, we use Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) to design a personalized cancer chemotherapy schedule. We show that both of them succeed in the task and outperform the Optimal Control solution in the presence of uncertainty. Furthermore, we show that DDPG can exterminate cancer more efficiently than DQN due to its continuous action space. Finally, we provide some intuition regarding the amount of samples required for the training.

Title:Personalized Cancer Chemotherapy Schedule: a numerical comparison of performance and robustness in model-based and model-free scheduling methodologies

Paper and Code