Abstract:Heart rate estimation based on remote photoplethysmography plays an important role in several specific scenarios, such as health monitoring and fatigue detection. Existing well-established methods are committed to taking the average of the predicted HRs of multiple overlapping video clips as the final results for the 30-second facial video. Although these methods with hundreds of layers and thousands of channels are highly accurate and robust, they require enormous computational budget and a 30-second wait time, which greatly limits the application of the algorithms to scale. Under these cicumstacnces, We propose a lightweight fast pulse simulation network (LFPS-Net), pursuing the best accuracy within a very limited computational and time budget, focusing on common mobile platforms, such as smart phones. In order to suppress the noise component and get stable pulse in a short time, we design a multi-frequency modal signal fusion mechanism, which exploits the theory of time-frequency domain analysis to separate multi-modal information from complex signals. It helps proceeding network learn the effective fetures more easily without adding any parameter. In addition, we design a oversampling training strategy to solve the problem caused by the unbalanced distribution of dataset. For the 30-second facial videos, our proposed method achieves the best results on most of the evaluation metrics for estimating heart rate or heart rate variability compared to the best available papers. The proposed method can still obtain very competitive results by using a short-time (~15-second) facail video.
Abstract:Blood pressure indicates cardiac function and peripheral vascular resistance and is critical for disease diagnosis. Traditionally, blood pressure data are mainly acquired through contact sensors, which require high maintenance and may be inconvenient and unfriendly to some people (e.g., burn patients). In this paper, an efficient non-contact blood pressure measurement network based on face videos is proposed for the first time. An innovative oversampling training strategy is proposed to handle the unbalanced data distribution. The input video sequences are first normalized and converted to our proposed YUVT color space. Then, the Spatio-temporal slicer encodes it into a multi-domain Spatio-temporal mapping. Finally, the neural network computation module, used for high-dimensional feature extraction of the multi-domain spatial feature mapping, after which the extracted high-dimensional features are used to enhance the time-domain feature association using LSTM, is computed by the blood pressure classifier to obtain the blood pressure measurement intervals. Combining the output of feature extraction and the result after classification, the blood pressure calculator, calculates the blood pressure measurement values. The solution uses a blood pressure classifier to calculate blood pressure intervals, which can help the neural network distinguish between the high-dimensional features of different blood pressure intervals and alleviate the overfitting phenomenon. It can also locate the blood pressure intervals, correct the final blood pressure values and improve the network performance. Experimental results on two datasets show that the network outperforms existing state-of-the-art methods.