Abstract:Autonomous mobile robots like self-flying drones and industrial robots heavily depend on depth images to perform tasks such as 3D reconstruction and visual SLAM. However, the presence of inaccuracies in these depth images can greatly hinder the effectiveness of these applications, resulting in sub-optimal results. Depth images produced by commercially available cameras frequently exhibit noise, which manifests as flickering pixels and erroneous patches. ML-based methods to rectify these images are unsuitable for edge devices that have very limited computational resources. Non-ML methods are much faster but have limited accuracy, especially for correcting errors that are a result of occlusion and camera movement. We propose a scheme called VoxDepth that is fast, accurate, and runs very well on edge devices. It relies on a host of novel techniques: 3D point cloud construction and fusion, and using it to create a template that can fix erroneous depth images. VoxDepth shows superior results on both synthetic and real-world datasets. We demonstrate a 31% improvement in quality as compared to state-of-the-art methods on real-world depth datasets, while maintaining a competitive framerate of 27 FPS (frames per second).
Abstract:High-frequency displays are gaining immense popularity because of their increasing use in video games and virtual reality applications. However, the issue is that the underlying GPUs cannot continuously generate frames at this high rate -- this results in a less smooth and responsive experience. Furthermore, if the frame rate is not synchronized with the refresh rate, the user may experience screen tearing and stuttering. Previous works propose increasing the frame rate to provide a smooth experience on modern displays by predicting new frames based on past or future frames. Interpolation and extrapolation are two widely used algorithms that predict new frames. Interpolation requires waiting for the future frame to make a prediction, which adds additional latency. On the other hand, extrapolation provides a better quality of experience because it relies solely on past frames -- it does not incur any additional latency. The simplest method to extrapolate a frame is to warp the previous frame using motion vectors; however, the warped frame may contain improperly rendered visual artifacts due to dynamic objects -- this makes it very challenging to design such a scheme. Past work has used DNNs to get good accuracy, however, these approaches are slow. This paper proposes Exwarp -- an approach based on reinforcement learning (RL) to intelligently choose between the slower DNN-based extrapolation and faster warping-based methods to increase the frame rate by 4x with an almost negligible reduction in the perceived image quality.