In this paper we consider a remote state estimation problem where a sensor can, at each discrete time instant, transmit on one out of M different communication channels. A key difficulty of the situation at hand is that the channel statistics are unknown. We study the case where both learning of the channel reception probabilities and state estimation is carried out simultaneously. Methods for choosing the channels based on techniques for multi-armed bandits are presented, and shown to provide stability of the remote estimator. Furthermore, we define the performance notion of estimation regret, and derive bounds on how it scales with time for the considered algorithms.