Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dayou Li

Meta-reasoning Using Attention Maps and Its Applications in Cloud Robotics

May 06, 2025

Adrian Lendinez, Renxi Qiu, Lanfranco Zanzi, Dayou Li

Abstract:Metareasoning, a branch of AI, focuses on reasoning about reasons. It has the potential to enhance robots' decision-making processes in unexpected situations. However, the concept has largely been confined to theoretical discussions and case-by-case investigations, lacking general and practical solutions when the Value of Computation (VoC) is undefined, which is common in unexpected situations. In this work, we propose a revised meta-reasoning framework that significantly improves the scalability of the original approach in unexpected situations. This is achieved by incorporating semantic attention maps and unsupervised 'attention' updates into the metareasoning processes. To accommodate environmental dynamics, 'lines of thought' are used to bridge context-specific objects with abstracted attentions, while meta-information is monitored and controlled at the meta-level for effective reasoning. The practicality of the proposed approach is demonstrated through cloud robots deployed in real-world scenarios, showing improved performance and robustness.

Via

Access Paper or Ask Questions

Coarse-to-Fine Detection of Multiple Seams for Robotic Welding

Aug 20, 2024

Pengkun Wei, Shuo Cheng, Dayou Li, Ran Song, Yipeng Zhang, Wei Zhang

Figure 1 for Coarse-to-Fine Detection of Multiple Seams for Robotic Welding

Figure 2 for Coarse-to-Fine Detection of Multiple Seams for Robotic Welding

Figure 3 for Coarse-to-Fine Detection of Multiple Seams for Robotic Welding

Figure 4 for Coarse-to-Fine Detection of Multiple Seams for Robotic Welding

Abstract:Efficiently detecting target weld seams while ensuring sub-millimeter accuracy has always been an important challenge in autonomous welding, which has significant application in industrial practice. Previous works mostly focused on recognizing and localizing welding seams one by one, leading to inferior efficiency in modeling the workpiece. This paper proposes a novel framework capable of multiple weld seams extraction using both RGB images and 3D point clouds. The RGB image is used to obtain the region of interest by approximately localizing the weld seams, and the point cloud is used to achieve the fine-edge extraction of the weld seams within the region of interest using region growth. Our method is further accelerated by using a pre-trained deep learning model to ensure both efficiency and generalization ability. The performance of the proposed method has been comprehensively tested on various workpieces featuring both linear and curved weld seams and in physical experiment systems. The results showcase considerable potential for real-world industrial applications, emphasizing the method's efficiency and effectiveness. Videos of the real-world experiments can be found at https://youtu.be/pq162HSP2D4.

Via

Access Paper or Ask Questions

MPGNet: Learning Move-Push-Grasping Synergy for Target-Oriented Grasping in Occluded Scenes

Aug 20, 2024

Dayou Li, Chenkun Zhao, Shuo Yang, Ran Song, Xiaolei Li, Wei Zhang

Figure 1 for MPGNet: Learning Move-Push-Grasping Synergy for Target-Oriented Grasping in Occluded Scenes

Figure 2 for MPGNet: Learning Move-Push-Grasping Synergy for Target-Oriented Grasping in Occluded Scenes

Figure 3 for MPGNet: Learning Move-Push-Grasping Synergy for Target-Oriented Grasping in Occluded Scenes

Figure 4 for MPGNet: Learning Move-Push-Grasping Synergy for Target-Oriented Grasping in Occluded Scenes

Abstract:This paper focuses on target-oriented grasping in occluded scenes, where the target object is specified by a binary mask and the goal is to grasp the target object with as few robotic manipulations as possible. Most existing methods rely on a push-grasping synergy to complete this task. To deliver a more powerful target-oriented grasping pipeline, we present MPGNet, a three-branch network for learning a synergy between moving, pushing, and grasping actions. We also propose a multi-stage training strategy to train the MPGNet which contains three policy networks corresponding to the three actions. The effectiveness of our method is demonstrated via both simulated and real-world experiments.

* Accepted to IROS 2024

Via

Access Paper or Ask Questions

Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks

Aug 20, 2024

Dayou Li, Chenkun Zhao, Shuo Yang, Lin Ma, Yibin Li, Wei Zhang

Figure 1 for Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks

Figure 2 for Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks

Figure 3 for Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks

Figure 4 for Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks

Abstract:We study the task of language instruction-guided robotic manipulation, in which an embodied robot is supposed to manipulate the target objects based on the language instructions. In previous studies, the predicted manipulation regions of the target object typically do not change with specification from the language instructions, which means that the language perception and manipulation prediction are separate. However, in human behavioral patterns, the manipulation regions of the same object will change for different language instructions. In this paper, we propose Instruction-Guided Affordance Net (IGANet) for predicting affordance maps of instruction-guided robotic manipulation tasks by utilizing powerful priors from vision and language encoders pre-trained on large-scale datasets. We develop a Vison-Language-Models(VLMs)-based data augmentation pipeline, which can generate a large amount of data automatically for model training. Besides, with the help of Large-Language-Models(LLMs), actions can be effectively executed to finish the tasks defined by instructions. A series of real-world experiments revealed that our method can achieve better performance with generated data. Moreover, our model can generalize better to scenarios with unseen objects and language instructions.

* Accepted to ICARM 2024

Via

Access Paper or Ask Questions

Where to Fetch: Extracting Visual Scene Representation from Large Pre-Trained Models for Robotic Goal Navigation

Aug 20, 2024

Yu Li, Dayou Li, Chenkun Zhao, Ruifeng Wang, Ran Song, Wei Zhang

Abstract:To complete a complex task where a robot navigates to a goal object and fetches it, the robot needs to have a good understanding of the instructions and the surrounding environment. Large pre-trained models have shown capabilities to interpret tasks defined via language descriptions. However, previous methods attempting to integrate large pre-trained models with daily tasks are not competent in many robotic goal navigation tasks due to poor understanding of the environment. In this work, we present a visual scene representation built with large-scale visual language models to form a feature representation of the environment capable of handling natural language queries. Combined with large language models, this method can parse language instructions into action sequences for a robot to follow, and accomplish goal navigation with querying the scene representation. Experiments demonstrate that our method enables the robot to follow a wide range of instructions and complete complex goal navigation tasks.

Via

Access Paper or Ask Questions

An Evolutionary-Based Approach to Learning Multiple Decision Models from Underrepresented Data

May 24, 2008

Vitaly Schetinin, Dayou Li, Carsten Maple

Figure 1 for An Evolutionary-Based Approach to Learning Multiple Decision Models from Underrepresented Data

Figure 2 for An Evolutionary-Based Approach to Learning Multiple Decision Models from Underrepresented Data

Figure 3 for An Evolutionary-Based Approach to Learning Multiple Decision Models from Underrepresented Data

Figure 4 for An Evolutionary-Based Approach to Learning Multiple Decision Models from Underrepresented Data

Abstract:The use of multiple Decision Models (DMs) enables to enhance the accuracy in decisions and at the same time allows users to evaluate the confidence in decision making. In this paper we explore the ability of multiple DMs to learn from a small amount of verified data. This becomes important when data samples are difficult to collect and verify. We propose an evolutionary-based approach to solving this problem. The proposed technique is examined on a few clinical problems presented by a small amount of data.

* 5 pages, 3 figures, 2 tables, The 4 th International Conference on Natural Computation (ICNC'08)

Via

Access Paper or Ask Questions