In the classic stochastic multi-armed bandit problem, it is well known that the sample mean for a chosen arm is a biased estimator of its true mean. In this paper, we characterize the effect of four sources of this selection bias: adaptively \emph{sampling} an arm at each step, adaptively \emph{stopping} the data collection, adaptively \emph{choosing} which arm to target for mean estimation, and adaptively \emph{rewinding} the clock to focus on the sample mean of the chosen arm at some past time. We qualitatively characterize data collecting strategies for which the bias induced by adaptive sampling and stopping can be negative or positive. For general parametric and nonparametric classes of distributions with varying tail decays, we provide bounds on the risk (expected Bregman divergence between the sample and true mean) that hold for arbitrary rules for sampling, stopping, choosing and rewinding. These risk bounds are minimax optimal up to log factors, and imply tight bounds on the selection bias and sufficient conditions for their consistency.