Many healthcare decisions involve navigating through a multitude of treatment options in a sequential and iterative manner to find an optimal treatment pathway with the goal of an optimal patient outcome. Such optimization problems may be amenable to reinforcement learning. A reinforcement learning agent could be trained to provide treatment recommendations for physicians, acting as a decision support tool. However, a number of difficulties arise when using RL beyond benchmark environments, such as specifying the reward function, choosing an appropriate state representation and evaluating the learned policy.