We propose a two-stage unsupervised approach for parsing videos into phases. We use motion cues to divide the video into coarse segments. Noisy segment labels are then used to weakly supervise an appearance-based classifier. We show the effectiveness of the method for phase detection in colonoscopy videos.