Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Domenico D. Bloisi

Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee

Nov 26, 2024

Filippo Ansalone, Flavio Maiorana, Daniele Affinita, Flavio Volpi, Eugenio Bugli, Francesco Petri, Michele Brienza, Valerio Spagnoli, Vincenzo Suriani, Daniele Nardi(+1 more)

Figure 1 for Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee

Figure 2 for Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee

Figure 3 for Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee

Abstract:Advancing human-robot communication is crucial for autonomous systems operating in dynamic environments, where accurate real-time interpretation of human signals is essential. RoboCup provides a compelling scenario for testing these capabilities, requiring robots to understand referee gestures and whistle with minimal network reliance. Using the NAO robot platform, this study implements a two-stage pipeline for gesture recognition through keypoint extraction and classification, alongside continuous convolutional neural networks (CCNNs) for efficient whistle detection. The proposed approach enhances real-time human-robot interaction in a competitive setting like RoboCup, offering some tools to advance the development of autonomous systems capable of cooperating with humans.

* 11th Italian Workshop on Artificial Intelligence and Robotics (AIRO 2024), Published in CEUR Workshop Proceedings AI*IA Series

Via

Access Paper or Ask Questions

EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution

Aug 30, 2024

Francesco Argenziano, Michele Brienza, Vincenzo Suriani, Daniele Nardi, Domenico D. Bloisi

Abstract:Task planning for robots in real-life settings presents significant challenges. These challenges stem from three primary issues: the difficulty in identifying grounded sequences of steps to achieve a goal; the lack of a standardized mapping between high-level actions and low-level commands; and the challenge of maintaining low computational overhead given the limited resources of robotic hardware. We introduce EMPOWER, a framework designed for open-vocabulary online grounding and planning for embodied agents aimed at addressing these issues. By leveraging efficient pre-trained foundation models and a multi-role mechanism, EMPOWER demonstrates notable improvements in grounded planning and execution. Quantitative results highlight the effectiveness of our approach, achieving an average success rate of 0.73 across six different real-life scenarios using a TIAGo robot.

* Accepted at IROS 2024

Via

Access Paper or Ask Questions

Multi-agent Planning using Visual Language Models

Aug 10, 2024

Michele Brienza, Francesco Argenziano, Vincenzo Suriani, Domenico D. Bloisi, Daniele Nardi

Abstract:Large Language Models (LLMs) and Visual Language Models (VLMs) are attracting increasing interest due to their improving performance and applications across various domains and tasks. However, LLMs and VLMs can produce erroneous results, especially when a deep understanding of the problem domain is required. For instance, when planning and perception are needed simultaneously, these models often struggle because of difficulties in merging multi-modal information. To address this issue, fine-tuned models are typically employed and trained on specialized data structures representing the environment. This approach has limited effectiveness, as it can overly complicate the context for processing. In this paper, we propose a multi-agent architecture for embodied task planning that operates without the need for specific data structures as input. Instead, it uses a single image of the environment, handling free-form domains by leveraging commonsense knowledge. We also introduce a novel, fully automatic evaluation procedure, PG2S, designed to better assess the quality of a plan. We validated our approach using the widely recognized ALFRED dataset, comparing PG2S to the existing KAS metric to further evaluate the quality of the generated plans.

Via

Access Paper or Ask Questions

Multi-Agent Coordination for a Partially Observable and Dynamic Robot Soccer Environment with Limited Communication

Jan 26, 2024

Daniele Affinita, Flavio Volpi, Valerio Spagnoli, Vincenzo Suriani, Daniele Nardi, Domenico D. Bloisi

Abstract:RoboCup represents an International testbed for advancing research in AI and robotics, focusing on a definite goal: developing a robot team that can win against the human world soccer champion team by the year 2050. To achieve this goal, autonomous humanoid robots' coordination is crucial. This paper explores novel solutions within the RoboCup Standard Platform League (SPL), where a reduction in WiFi communication is imperative, leading to the development of new coordination paradigms. The SPL has experienced a substantial decrease in network packet rate, compelling the need for advanced coordination architectures to maintain optimal team functionality in dynamic environments. Inspired by market-based task assignment, we introduce a novel distributed coordination system to orchestrate autonomous robots' actions efficiently in low communication scenarios. This approach has been tested with NAO robots during official RoboCup competitions and in the SimRobot simulator, demonstrating a notable reduction in task overlaps in limited communication settings.

* International Conference of the Italian Association for Artificial Intelligence (AIxIA 2023) - Italian Workshop on Artificial Intelligence and Robotics (AIRO) Rome, 6 - 9 November, 2023

Via

Access Paper or Ask Questions

MARIO: Modular and Extensible Architecture for Computing Visual Statistics in RoboCup SPL

Sep 20, 2022

Domenico D. Bloisi, Andrea Pennisi, Cristian Zampino, Flavio Biancospino, Francesco Laus, Gianluca Di Stefano, Michele Brienza, Rocchina Romano

Figure 1 for MARIO: Modular and Extensible Architecture for Computing Visual Statistics in RoboCup SPL

Figure 2 for MARIO: Modular and Extensible Architecture for Computing Visual Statistics in RoboCup SPL

Figure 3 for MARIO: Modular and Extensible Architecture for Computing Visual Statistics in RoboCup SPL

Figure 4 for MARIO: Modular and Extensible Architecture for Computing Visual Statistics in RoboCup SPL

Abstract:This technical report describes a modular and extensible architecture for computing visual statistics in RoboCup SPL (MARIO), presented during the SPL Open Research Challenge at RoboCup 2022, held in Bangkok (Thailand). MARIO is an open-source, ready-to-use software application whose final goal is to contribute to the growth of the RoboCup SPL community. MARIO comes with a GUI that integrates multiple machine learning and computer vision based functions, including automatic camera calibration, background subtraction, homography computation, player + ball tracking and localization, NAO robot pose estimation and fall detection. MARIO has been ranked no. 1 in the Open Research Challenge.

Via

Access Paper or Ask Questions

Multi-Spectral Image Synthesis for Crop/Weed Segmentation in Precision Farming

Sep 12, 2020

Mulham Fawakherji, Ciro Potena, Alberto Pretto, Domenico D. Bloisi, Daniele Nardi

Figure 1 for Multi-Spectral Image Synthesis for Crop/Weed Segmentation in Precision Farming

Figure 2 for Multi-Spectral Image Synthesis for Crop/Weed Segmentation in Precision Farming

Figure 3 for Multi-Spectral Image Synthesis for Crop/Weed Segmentation in Precision Farming

Figure 4 for Multi-Spectral Image Synthesis for Crop/Weed Segmentation in Precision Farming

Abstract:An effective perception system is a fundamental component for farming robots, as it enables them to properly perceive the surrounding environment and to carry out targeted operations. The most recent approaches make use of state-of-the-art machine learning techniques to learn an effective model for the target task. However, those methods need a large amount of labelled data for training. A recent approach to deal with this issue is data augmentation through Generative Adversarial Networks (GANs), where entire synthetic scenes are added to the training data, thus enlarging and diversifying their informative content. In this work, we propose an alternative solution with respect to the common data augmentation techniques, applying it to the fundamental problem of crop/weed segmentation in precision farming. Starting from real images, we create semi-artificial samples by replacing the most relevant object classes (i.e., crop and weeds) with their synthesized counterparts. To do that, we employ a conditional GAN (cGAN), where the generative model is trained by conditioning the shape of the generated object. Moreover, in addition to RGB data, we take into account also near-infrared (NIR) information, generating four channel multi-spectral synthetic images. Quantitative experiments, carried out on three publicly available datasets, show that (i) our model is capable of generating realistic multi-spectral images of plants and (ii) the usage of such synthetic images in the training process improves the segmentation performance of state-of-the-art semantic segmentation Convolutional Networks.

* Submitted to Robotics and Autonomous Systems

Via

Access Paper or Ask Questions