Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francesco Petri

Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee

Nov 26, 2024

Filippo Ansalone, Flavio Maiorana, Daniele Affinita, Flavio Volpi, Eugenio Bugli, Francesco Petri, Michele Brienza, Valerio Spagnoli, Vincenzo Suriani, Daniele Nardi(+1 more)

Figure 1 for Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee

Figure 2 for Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee

Figure 3 for Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee

Abstract:Advancing human-robot communication is crucial for autonomous systems operating in dynamic environments, where accurate real-time interpretation of human signals is essential. RoboCup provides a compelling scenario for testing these capabilities, requiring robots to understand referee gestures and whistle with minimal network reliance. Using the NAO robot platform, this study implements a two-stage pipeline for gesture recognition through keypoint extraction and classification, alongside continuous convolutional neural networks (CCNNs) for efficient whistle detection. The proposed approach enhances real-time human-robot interaction in a competitive setting like RoboCup, offering some tools to advance the development of autonomous systems capable of cooperating with humans.

* 11th Italian Workshop on Artificial Intelligence and Robotics (AIRO 2024), Published in CEUR Workshop Proceedings AI*IA Series

Via

Access Paper or Ask Questions

Transformers and Slot Encoding for Sample Efficient Physical World Modelling

May 30, 2024

Francesco Petri, Luigi Asprino, Aldo Gangemi

Abstract:World modelling, i.e. building a representation of the rules that govern the world so as to predict its evolution, is an essential ability for any agent interacting with the physical world. Recent applications of the Transformer architecture to the problem of world modelling from video input show notable improvements in sample efficiency. However, existing approaches tend to work only at the image level thus disregarding that the environment is composed of objects interacting with each other. In this paper, we propose an architecture combining Transformers for world modelling with the slot-attention paradigm, an approach for learning representations of objects appearing in a scene. We describe the resulting neural architecture and report experimental results showing an improvement over the existing solutions in terms of sample efficiency and a reduction of the variation of the performance over the training examples. The code for our architecture and experiments is available at https://github.com/torchipeppo/transformers-and-slot-encoding-for-wm

Via

Access Paper or Ask Questions