Learning personalized cancer treatment with machine learning holds great promise to improve cancer patients' chance of survival. Despite recent advances in machine learning and precision oncology, this approach remains challenging as collecting data in preclinical/clinical studies for modeling multiple treatment efficacies is often an expensive, time-consuming process. Moreover, the randomization in treatment allocation proves to be suboptimal since some participants/samples are not receiving the most appropriate treatments during the trial. To address this challenge, we formulate drug screening study as a "contextual bandit" problem, in which an algorithm selects anticancer therapeutics based on contextual information about cancer cell lines while adapting its treatment strategy to maximize treatment response in an "online" fashion. We propose using a novel deep Bayesian bandits framework that uses functional prior to approximate posterior for drug response prediction based on multi-modal information consisting of genomic features and drug structure. We empirically evaluate our method on three large-scale in vitro pharmacogenomic datasets and show that our approach outperforms several benchmarks in identifying optimal treatment for a given cell line.