Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shingo Kitagawa

Semantic Scene Difference Detection in Daily Life Patroling by Mobile Robots using Pre-Trained Large-Scale Vision-Language Model

Sep 28, 2023

Yoshiki Obinata, Kento Kawaharazuka, Naoaki Kanazawa, Naoya Yamaguchi, Naoto Tsukamoto, Iori Yanokura, Shingo Kitagawa, Koki Shinjo, Kei Okada, Masayuki Inaba

Abstract:It is important for daily life support robots to detect changes in their environment and perform tasks. In the field of anomaly detection in computer vision, probabilistic and deep learning methods have been used to calculate the image distance. These methods calculate distances by focusing on image pixels. In contrast, this study aims to detect semantic changes in the daily life environment using the current development of large-scale vision-language models. Using its Visual Question Answering (VQA) model, we propose a method to detect semantic changes by applying multiple questions to a reference image and a current image and obtaining answers in the form of sentences. Unlike deep learning-based methods in anomaly detection, this method does not require any training or fine-tuning, is not affected by noise, and is sensitive to semantic state changes in the real world. In our experiments, we demonstrated the effectiveness of this method by applying it to a patrol task in a real-life environment using a mobile robot, Fetch Mobile Manipulator. In the future, it may be possible to add explanatory power to changes in the daily life environment through spoken language.

* Accepted to 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

Via

Access Paper or Ask Questions

Instance Segmentation of Visible and Occluded Regions for Finding and Picking Target from a Pile of Objects

Jan 21, 2020

Kentaro Wada, Shingo Kitagawa, Kei Okada, Masayuki Inaba

Figure 1 for Instance Segmentation of Visible and Occluded Regions for Finding and Picking Target from a Pile of Objects

Figure 2 for Instance Segmentation of Visible and Occluded Regions for Finding and Picking Target from a Pile of Objects

Figure 3 for Instance Segmentation of Visible and Occluded Regions for Finding and Picking Target from a Pile of Objects

Figure 4 for Instance Segmentation of Visible and Occluded Regions for Finding and Picking Target from a Pile of Objects

Abstract:We present a robotic system for picking a target from a pile of objects that is capable of finding and grasping the target object by removing obstacles in the appropriate order. The fundamental idea is to segment instances with both visible and occluded masks, which we call `instance occlusion segmentation'. To achieve this, we extend an existing instance segmentation model with a novel `relook' architecture, in which the model explicitly learns the inter-instance relationship. Also, by using image synthesis, we make the system capable of handling new objects without human annotations. The experimental results show the effectiveness of the relook architecture when compared with a conventional model and of the image synthesis when compared to a human-annotated dataset. We also demonstrate the capability of our system to achieve picking a target in a cluttered environment with a real robot.

* 8 pages, 11 figures, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018

Via

Access Paper or Ask Questions