Conventional sleep monitoring is time-consuming, expensive and uncomfortable, requiring a large number of contact sensors to be attached to the patient. Video data is commonly recorded as part of a sleep laboratory assessment. If accurate sleep staging could be achieved solely from video, this would overcome many of the problems of traditional methods. In this work we use heart rate, breathing rate and activity measures, all derived from a near-infrared video camera, to perform sleep stage classification. We use a deep transfer learning approach to overcome data scarcity, by using an existing contact-sensor dataset to learn effective representations from the heart and breathing rate time series. Using a dataset of 50 healthy volunteers, we achieve an accuracy of 73.4\% and a Cohen's kappa of 0.61 in four-class sleep stage classification, establishing a new state-of-the-art for video-based sleep staging.