In this paper, we study how the mean shift algorithm can be used to denoise a dataset. We introduce a new framework to analyze the mean shift algorithm as a denoising approach by viewing the algorithm as an operator on a distribution function. We investigate how the mean shift algorithm changes the distribution and show that data points shifted by the mean shift concentrate around high density regions of the underlying density function. By using the mean shift as a denoising method, we enhance the performance of several clustering techniques, improve the power of two-sample tests, and obtain a new method for anomaly detection.