Abstract:The rapid development of collaborative robotics has provided a new possibility of helping the elderly who has difficulties in daily life, allowing robots to operate according to specific intentions. However, efficient human-robot cooperation requires natural, accurate and reliable intention recognition in shared environments. The current paramount challenge for this is reducing the uncertainty of multimodal fused intention to be recognized and reasoning adaptively a more reliable result despite current interactive condition. In this work we propose a novel learning-based multimodal fusion framework Batch Multimodal Confidence Learning for Opinion Pool (BMCLOP). Our approach combines Bayesian multimodal fusion method and batch confidence learning algorithm to improve accuracy, uncertainty reduction and success rate given the interactive condition. In particular, the generic and practical multimodal intention recognition framework can be easily extended further. Our desired assistive scenarios consider three modalities gestures, speech and gaze, all of which produce categorical distributions over all the finite intentions. The proposed method is validated with a six-DoF robot through extensive experiments and exhibits high performance compared to baselines.
Abstract:High-resolution depth map can be inferred from a low-resolution one with the guidance of an additional high-resolution texture map of the same scene. Recently, deep neural networks with large receptive fields are shown to benefit applications such as image completion. Our insight is that super resolution is similar to image completion, where only parts of the depth values are precisely known. In this paper, we present a joint convolutional neural pyramid model with large receptive fields for joint depth map super-resolution. Our model consists of three sub-networks, two convolutional neural pyramids concatenated by a normal convolutional neural network. The convolutional neural pyramids extract information from large receptive fields of the depth map and guidance map, while the convolutional neural network effectively transfers useful structures of the guidance image to the depth image. Experimental results show that our model outperforms existing state-of-the-art algorithms not only on data pairs of RGB/depth images, but also on other data pairs like color/saliency and color-scribbles/colorized images.