Despite the successful application of machine learning (ML) in a wide range of domains, adaptability---the very property that makes machine learning desirable---can be exploited by adversaries to contaminate training and evade classification. In this paper, we investigate the feasibility of applying a specific class of machine learning algorithms, namely, reinforcement learning (RL) algorithms, for autonomous cyber defence in software-defined networking (SDN). In particular, we focus on how an RL agent reacts towards different forms of causative attacks that poison its training process, including indiscriminate and targeted, white-box and black-box attacks. In addition, we also study the impact of the attack timing, and explore potential countermeasures such as adversarial training.