Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Emanuele Menegatti

Disentangled Iterative Surface Fitting for Contact-stable Grasp Planning

Feb 17, 2025

Tomoya Yamanokuchi, Alberto Bacchin, Emilio Olivastri, Takamitsu Matsubara, Emanuele Menegatti

Abstract:In this work, we address the limitation of surface fitting-based grasp planning algorithm, which primarily focuses on geometric alignment between the gripper and object surface while overlooking the stability of contact point distribution, often resulting in unstable grasps due to inadequate contact configurations. To overcome this limitation, we propose a novel surface fitting algorithm that integrates contact stability while preserving geometric compatibility. Inspired by human grasping behavior, our method disentangles the grasp pose optimization into three sequential steps: (1) rotation optimization to align contact normals, (2) translation refinement to improve Center of Mass (CoM) alignment, and (3) gripper aperture adjustment to optimize contact point distribution. We validate our approach through simulations on ten YCB dataset objects, demonstrating an 80% improvement in grasp success over conventional surface fitting methods that disregard contact stability. Further details can be found on our project page: https://tomoya-yamanokuchi.github.io/disf-project-page/.

* 18 pages, 6 figures, 1 table

Via

Access Paper or Ask Questions

Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation

Oct 14, 2024

Daniel Fusaro, Simone Mosco, Emanuele Menegatti, Alberto Pretto

Abstract:Semantic segmentation of point clouds is an essential task for understanding the environment in autonomous driving and robotics. Recent range-based works achieve real-time efficiency, while point- and voxel-based methods produce better results but are affected by high computational complexity. Moreover, highly complex deep learning models are often not suited to efficiently learn from small datasets. Their generalization capabilities can easily be driven by the abundance of data rather than the architecture design. In this paper, we harness the information from the three-dimensional representation to proficiently capture local features, while introducing the range image representation to incorporate additional information and facilitate fast computation. A GPU-based KDTree allows for rapid building, querying, and enhancing projection with straightforward operations. Extensive experiments on SemanticKITTI and nuScenes datasets demonstrate the benefits of our modification in a ``small data'' setup, in which only one sequence of the dataset is used to train the models, but also in the conventional setup, where all sequences except one are used for training. We show that a reduced version of our model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time, making it a viable choice for real-world case applications. The code of our method is available at https://github.com/Bender97/WaffleAndRange.

* This paper has been accepted for publication at the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Via

Access Paper or Ask Questions

WasteGAN: Data Augmentation for Robotic Waste Sorting through Generative Adversarial Networks

Sep 25, 2024

Alberto Bacchin, Leonardo Barcellona, Matteo Terreran, Stefano Ghidoni, Emanuele Menegatti, Takuya Kiyokawa

Abstract:Robotic waste sorting poses significant challenges in both perception and manipulation, given the extreme variability of objects that should be recognized on a cluttered conveyor belt. While deep learning has proven effective in solving complex tasks, the necessity for extensive data collection and labeling limits its applicability in real-world scenarios like waste sorting. To tackle this issue, we introduce a data augmentation method based on a novel GAN architecture called wasteGAN. The proposed method allows to increase the performance of semantic segmentation models, starting from a very limited bunch of labeled examples, such as few as 100. The key innovations of wasteGAN include a novel loss function, a novel activation function, and a larger generator block. Overall, such innovations helps the network to learn from limited number of examples and synthesize data that better mirrors real-world distributions. We then leverage the higher-quality segmentation masks predicted from models trained on the wasteGAN synthetic data to compute semantic-aware grasp poses, enabling a robotic arm to effectively recognizing contaminants and separating waste in a real-world scenario. Through comprehensive evaluation encompassing dataset-based assessments and real-world experiments, our methodology demonstrated promising potential for robotic waste sorting, yielding performance gains of up to 5.8\% in picking contaminants. The project page is available at https://github.com/bach05/wasteGAN.git

* Accepted at 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

Via

Access Paper or Ask Questions

SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation

Sep 02, 2024

Alberto Bacchin, Davide Allegro, Stefano Ghidoni, Emanuele Menegatti

Figure 1 for SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation

Figure 2 for SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation

Figure 3 for SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation

Figure 4 for SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation

Abstract:Out-of-Distribution (OOD) detection in computer vision is a crucial research area, with related benchmarks playing a vital role in assessing the generalizability of models and their applicability in real-world scenarios. However, existing OOD benchmarks in the literature suffer from two main limitations: (1) they often overlook semantic shift as a potential challenge, and (2) their scale is limited compared to the large datasets used to train modern models. To address these gaps, we introduce SOOD-ImageNet, a novel dataset comprising around 1.6M images across 56 classes, designed for common computer vision tasks such as image classification and semantic segmentation under OOD conditions, with a particular focus on the issue of semantic shift. We ensured the necessary scalability and quality by developing an innovative data engine that leverages the capabilities of modern vision-language models, complemented by accurate human checks. Through extensive training and evaluation of various models on SOOD-ImageNet, we showcase its potential to significantly advance OOD research in computer vision. The project page is available at https://github.com/bach05/SOODImageNet.git.

* Accepeted as long paper at "The 3rd Workshop for Out-of-Distribution Generalization in Computer Vision Foundation Models", ECCV 2024

Via

Access Paper or Ask Questions

Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models

Apr 19, 2024

Leonardo Barcellona, Alberto Bacchin, Matteo Terreran, Emanuele Menegatti, Stefano Ghidoni

Figure 1 for Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models

Figure 2 for Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models

Figure 3 for Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models

Figure 4 for Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models

Abstract:The ability of a robot to pick an object, known as robot grasping, is crucial for several applications, such as assembly or sorting. In such tasks, selecting the right target to pick is as essential as inferring a correct configuration of the gripper. A common solution to this problem relies on semantic segmentation models, which often show poor generalization to unseen objects and require considerable time and massive data to be trained. To reduce the need for large datasets, some grasping pipelines exploit few-shot semantic segmentation models, which are capable of recognizing new classes given a few examples. However, this often comes at the cost of limited performance and fine-tuning is required to be effective in robot grasping scenarios. In this work, we propose to overcome all these limitations by combining the impressive generalization capability reached by foundation models with a high-performing few-shot classifier, working as a score function to select the segmentation that is closer to the support set. The proposed model is designed to be embedded in a grasp synthesis pipeline. The extensive experiments using one or five examples show that our novel approach overcomes existing performance limitations, improving the state of the art both in few-shot semantic segmentation on the Graspnet-1B (+10.5% mIoU) and Ocid-grasp (+1.6% AP) datasets, and real-world few-shot grasp synthesis (+21.7% grasp accuracy). The project page is available at: https://leobarcellona.github.io/showandgrasp.github.io/

Via

Access Paper or Ask Questions

A Graph-based Optimization Framework for Hand-Eye Calibration for Multi-Camera Setups

Mar 08, 2023

Daniele Evangelista, Emilio Olivastri, Davide Allegro, Emanuele Menegatti, Alberto Pretto

Figure 1 for A Graph-based Optimization Framework for Hand-Eye Calibration for Multi-Camera Setups

Figure 2 for A Graph-based Optimization Framework for Hand-Eye Calibration for Multi-Camera Setups

Figure 3 for A Graph-based Optimization Framework for Hand-Eye Calibration for Multi-Camera Setups

Figure 4 for A Graph-based Optimization Framework for Hand-Eye Calibration for Multi-Camera Setups

Abstract:Hand-eye calibration is the problem of estimating the spatial transformation between a reference frame, usually the base of a robot arm or its gripper, and the reference frame of one or multiple cameras. Generally, this calibration is solved as a non-linear optimization problem, what instead is rarely done is to exploit the underlying graph structure of the problem itself. Actually, the problem of hand-eye calibration can be seen as an instance of the Simultaneous Localization and Mapping (SLAM) problem. Inspired by this fact, in this work we present a pose-graph approach to the hand-eye calibration problem that extends a recent state-of-the-art solution in two different ways: i) by formulating the solution to eye-on-base setups with one camera; ii) by covering multi-camera robotic setups. The proposed approach has been validated in simulation against standard hand-eye calibration methods. Moreover, a real application is shown. In both scenarios, the proposed approach overcomes all alternative methods. We release with this paper an open-source implementation of our graph-based optimization framework for multi-camera setups.

* This paper has been accepted for publication at the 2023 IEEE International Conference on Robotics and Automation (ICRA)

Via

Access Paper or Ask Questions

Pushing the Limits of Learning-based Traversability Analysis for Autonomous Driving on CPU

Jun 07, 2022

Daniel Fusaro, Emilio Olivastri, Daniele Evangelista, Marco Imperoli, Emanuele Menegatti, Alberto Pretto

Figure 1 for Pushing the Limits of Learning-based Traversability Analysis for Autonomous Driving on CPU

Figure 2 for Pushing the Limits of Learning-based Traversability Analysis for Autonomous Driving on CPU

Figure 3 for Pushing the Limits of Learning-based Traversability Analysis for Autonomous Driving on CPU

Figure 4 for Pushing the Limits of Learning-based Traversability Analysis for Autonomous Driving on CPU

Abstract:Self-driving vehicles and autonomous ground robots require a reliable and accurate method to analyze the traversability of the surrounding environment for safe navigation. This paper proposes and evaluates a real-time machine learning-based Traversability Analysis method that combines geometric features with appearance-based features in a hybrid approach based on a SVM classifier. In particular, we show that integrating a new set of geometric and visual features and focusing on important implementation details enables a noticeable boost in performance and reliability. The proposed approach has been compared with state-of-the-art Deep Learning approaches on a public dataset of outdoor driving scenarios. It reaches an accuracy of 89.2% in scenarios of varying complexity, demonstrating its effectiveness and robustness. The method runs fully on CPU and reaches comparable results with respect to the other methods, operates faster, and requires fewer hardware resources.

* Accepted to 17th International Conference on Intelligent Autonomous Systems (IAS-17)

Via

Access Paper or Ask Questions

People Tracking in Panoramic Video for Guiding Robots

Jun 06, 2022

Alberto Bacchin, Filippo Berno, Emanuele Menegatti, Alberto Pretto

Figure 1 for People Tracking in Panoramic Video for Guiding Robots

Figure 2 for People Tracking in Panoramic Video for Guiding Robots

Figure 3 for People Tracking in Panoramic Video for Guiding Robots

Figure 4 for People Tracking in Panoramic Video for Guiding Robots

Abstract:A guiding robot aims to effectively bring people to and from specific places within environments that are possibly unknown to them. During this operation the robot should be able to detect and track the accompanied person, trying never to lose sight of her/him. A solution to minimize this event is to use an omnidirectional camera: its 360{\deg} Field of View (FoV) guarantees that any framed object cannot leave the FoV if not occluded or very far from the sensor. However, the acquired panoramic videos introduce new challenges in perception tasks such as people detection and tracking, including the large size of the images to be processed, the distortion effects introduced by the cylindrical projection and the periodic nature of panoramic images. In this paper, we propose a set of targeted methods that allow to effectively adapt to panoramic videos a standard people detection and tracking pipeline originally designed for perspective cameras. Our methods have been implemented and tested inside a deep learning-based people detection and tracking framework with a commercial 360{\deg} camera. Experiments performed on datasets specifically acquired for guiding robot applications and on a real service robot show the effectiveness of the proposed approach over other state-of-the-art systems. We release with this paper the acquired and annotated datasets and the open-source implementation of our method.

* Accepted to 17th International Conference on Intelligent Autonomous Systems (IAS-17)

Via

Access Paper or Ask Questions

Learning to Segment Human Body Parts with Synthetically Trained Deep Convolutional Networks

Feb 02, 2021

Alessandro Saviolo, Matteo Bonotto, Daniele Evangelista, Marco Imperoli, Emanuele Menegatti, Alberto Pretto

Figure 1 for Learning to Segment Human Body Parts with Synthetically Trained Deep Convolutional Networks

Figure 2 for Learning to Segment Human Body Parts with Synthetically Trained Deep Convolutional Networks

Figure 3 for Learning to Segment Human Body Parts with Synthetically Trained Deep Convolutional Networks

Figure 4 for Learning to Segment Human Body Parts with Synthetically Trained Deep Convolutional Networks

Abstract:This paper presents a new framework for human body part segmentation based on Deep Convolutional Neural Networks trained using only synthetic data. The proposed approach achieves cutting-edge results without the need of training the models with real annotated data of human body parts. Our contributions include a data generation pipeline, that exploits a game engine for the creation of the synthetic data used for training the network, and a novel pre-processing module, that combines edge response map and adaptive histogram equalization to guide the network to learn the shape of the human body parts ensuring robustness to changes in the illumination conditions. For selecting the best candidate architecture, we performed exhaustive tests on manually-annotated images of real human body limbs. We further present an ablation study to validate our pre-processing module. The results show that our method outperforms several state-of-the-art semantic segmentation networks by a large margin. We release an implementation of the proposed approach along with the acquired datasets with this paper.

* Submitted to the 16th International Conference on Intelligent Autonomous System (IAS)

Via

Access Paper or Ask Questions

Receding Horizon Task and Motion Planning in Dynamic Environments

Sep 07, 2020

Nicola Castaman, Enrico Pagello, Emanuele Menegatti, Alberto Pretto

Figure 1 for Receding Horizon Task and Motion Planning in Dynamic Environments

Figure 2 for Receding Horizon Task and Motion Planning in Dynamic Environments

Figure 3 for Receding Horizon Task and Motion Planning in Dynamic Environments

Figure 4 for Receding Horizon Task and Motion Planning in Dynamic Environments

Abstract:Complex manipulation tasks in crowded environments require careful integration of symbolic reasoning and motion planning. This problem, commonly defined as Task and Motion Planning (TAMP), is even more challenging if the working space is dynamic and perceived with noisy, non-ideal sensors. In this work, we propose an online, approximated TAMP method that combines a geometric reasoning module and a motion planner with a standard task planner in a receding horizon fashion. Our approach iteratively solves a reduced planning problem over a receding window of a limited number of future actions during the implementation of the actions. At each iteration, only the first action of the horizon is actually implemented, then the window is moved forward and the problem is solved again. This procedure allows to naturally take into account the dynamic changes on the scene while ensuring good runtime performance. We validate our approach within extensive simulated experiments that show that our approach is able to deal with unexpected random changes in the environment configuration while ensuring comparable performance with respect to other recent TAMP approaches in solving traditional, static problems. We release with this paper the open-source implementation of our method.

* Submitted to Robotics and Autonomous Systems

Via

Access Paper or Ask Questions