Abstract:Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. In real-world scenarios, the images may be blurry or noisy due to the influence of weather conditions and inherent limitations of the camera. Therefore, it is particularly important to develop a robust depth estimation model. Benefiting from the training strategies of generative networks, generative-based methods often exhibit enhanced robustness. In light of this, we employ a well-converging diffusion model among generative networks for unsupervised monocular depth estimation. Additionally, we propose a hierarchical feature-guided denoising module. This model significantly enriches the model's capacity for learning and interpreting depth distribution by fully leveraging image features to guide the denoising process. Furthermore, we explore the implicit depth within reprojection and design an implicit depth consistency loss. This loss function serves to enhance the performance of the model and ensure the scale consistency of depth within a video sequence. We conduct experiments on the KITTI, Make3D, and our self-collected SIMIT datasets. The results indicate that our approach stands out among generative-based models, while also showcasing remarkable robustness.
Abstract:Most smart systems such as smart home and smart health response to human's locations and activities. However, traditional solutions are either require wearable sensors or lead to leaking privacy. This work proposes an ambient radar solution which is a real-time, privacy secure and dark surroundings resistant system. In this solution, we use a low power, Frequency-Modulated Continuous Wave (FMCW) radar array to capture the reflected signals and then construct to 3D image frames. This solution designs $1)$a data preprocessing mechanism to remove background static reflection, $2)$a signal processing mechanism to transfer received complex radar signals to a matrix contains spacial information, and $3)$ a Deep Learning scheme to filter broken frame which caused by the rough surface of human's body. This solution has been extensively evaluated in a research area and captures real-time human images that are recognizable for specific activities. Our results show that the indoor capturing is clear to be recognized frame by frame compares to camera recorded video.