Abstract:Light field data has been demonstrated to facilitate the depth estimation task. Most learning-based methods estimate the depth infor-mation from EPI or sub-aperture images, while less methods pay attention to the focal stack. Existing learning-based depth estimation methods from the focal stack lead to suboptimal performance because of the defocus blur. In this paper, we propose a multi-modal learning method for robust light field depth estimation. We first excavate the internal spatial correlation by designing a context reasoning unit which separately extracts comprehensive contextual information from the focal stack and RGB images. Then we integrate the contextual information by exploiting a attention-guide cross-modal fusion module. Extensive experiments demonstrate that our method achieves superior performance than existing representative methods on two light field datasets. Moreover, visual results on a mobile phone dataset show that our method can be widely used in daily life.
Abstract:Focus based methods have shown promising results for the task of depth estimation. However, most existing focus based depth estimation approaches depend on maximal sharpness of the focal stack. Out of focus information in the focal stack poses challenges for this task. In this paper, we propose a dynamically multi modal learning strategy which incorporates RGB data and the focal stack in our framework. Our goal is to deeply excavate the spatial correlation in the focal stack by designing the spatial correlation perception module and dynamically fuse multi modal information between RGB data and the focal stack in a adaptive way by designing the multi modal dynamic fusion module. The success of our method is demonstrated by achieving the state of the art performance on two datasets. Furthermore, we test our network on a set of different focused images generated by a smart phone camera to prove that the proposed method not only broke the limitation of only using light field data, but also open a path toward practical applications of depth estimation on common consumer level cameras data.