Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniele De Martini

MinkOcc: Towards real-time label-efficient semantic occupancy prediction

Apr 03, 2025

Samuel Sze, Daniele De Martini, Lars Kunze

Abstract:Developing 3D semantic occupancy prediction models often relies on dense 3D annotations for supervised learning, a process that is both labor and resource-intensive, underscoring the need for label-efficient or even label-free approaches. To address this, we introduce MinkOcc, a multi-modal 3D semantic occupancy prediction framework for cameras and LiDARs that proposes a two-step semi-supervised training procedure. Here, a small dataset of explicitly 3D annotations warm-starts the training process; then, the supervision is continued by simpler-to-annotate accumulated LiDAR sweeps and images -- semantically labelled through vision foundational models. MinkOcc effectively utilizes these sensor-rich supervisory cues and reduces reliance on manual labeling by 90\% while maintaining competitive accuracy. In addition, the proposed model incorporates information from LiDAR and camera data through early fusion and leverages sparse convolution networks for real-time prediction. With its efficiency in both supervision and computation, we aim to extend MinkOcc beyond curated datasets, enabling broader real-world deployment of 3D semantic occupancy prediction in autonomous driving.

* 8 pages

Via

Access Paper or Ask Questions

Task-Oriented Co-Design of Communication, Computing, and Control for Edge-Enabled Industrial Cyber-Physical Systems

Mar 11, 2025

Yufeng Diao, Yichi Zhang, Daniele De Martini, Philip Guodong Zhao, Emma Liying Li

Abstract:This paper proposes a task-oriented co-design framework that integrates communication, computing, and control to address the key challenges of bandwidth limitations, noise interference, and latency in mission-critical industrial Cyber-Physical Systems (CPS). To improve communication efficiency and robustness, we design a task-oriented Joint Source-Channel Coding (JSCC) using Information Bottleneck (IB) to enhance data transmission efficiency by prioritizing task-specific information. To mitigate the perceived End-to-End (E2E) delays, we develop a Delay-Aware Trajectory-Guided Control Prediction (DTCP) strategy that integrates trajectory planning with control prediction, predicting commands based on E2E delay. Moreover, the DTCP is co-designed with task-oriented JSCC, focusing on transmitting task-specific information for timely and reliable autonomous driving. Experimental results in the CARLA simulator demonstrate that, under an E2E delay of 1 second (20 time slots), the proposed framework achieves a driving score of 48.12, which is 31.59 points higher than using Better Portable Graphics (BPG) while reducing bandwidth usage by 99.19%.

* This paper has been accepted for publication in IEEE Journal on Selected Areas in Communications (JSAC), with publication expected in 2025

Via

Access Paper or Ask Questions

Tiny Lidars for Manipulator Self-Awareness: Sensor Characterization and Initial Localization Experiments

Mar 05, 2025

Giammarco Caroleo, Alessandro Albini, Daniele De Martini, Timothy D. Barfoot, Perla Maiolino

Abstract:For several tasks, ranging from manipulation to inspection, it is beneficial for robots to localize a target object in their surroundings. In this paper, we propose an approach that utilizes coarse point clouds obtained from miniaturized VL53L5CX Time-of-Flight (ToF) sensors (tiny lidars) to localize a target object in the robot's workspace. We first conduct an experimental campaign to calibrate the dependency of sensor readings on relative range and orientation to targets. We then propose a probabilistic sensor model that is validated in an object pose estimation task using a Particle Filter (PF). The results show that the proposed sensor model improves the performance of the localization of the target object with respect to two baselines: one that assumes measurements are free from uncertainty and one in which the confidence is provided by the sensor datasheet.

* 7 pages, 6 figures, 3 tables, conference submission

Via

Access Paper or Ask Questions

Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval

Nov 06, 2024

Davide Buoso, Luke Robinson, Giuseppe Averta, Philip Torr, Tim Franzmeyer, Daniele De Martini

Abstract:This study explores the potential of off-the-shelf Vision-Language Models (VLMs) for high-level robot planning in the context of autonomous navigation. Indeed, while most of existing learning-based approaches for path planning require extensive task-specific training/fine-tuning, we demonstrate how such training can be avoided for most practical cases. To do this, we introduce Select2Plan (S2P), a novel training-free framework for high-level robot planning which completely eliminates the need for fine-tuning or specialised training. By leveraging structured Visual Question-Answering (VQA) and In-Context Learning (ICL), our approach drastically reduces the need for data collection, requiring a fraction of the task-specific data typically used by trained models, or even relying only on online data. Our method facilitates the effective use of a generally trained VLM in a flexible and cost-efficient way, and does not require additional sensing except for a simple monocular camera. We demonstrate its adaptability across various scene types, context sources, and sensing setups. We evaluate our approach in two distinct scenarios: traditional First-Person View (FPV) and infrastructure-driven Third-Person View (TPV) navigation, demonstrating the flexibility and simplicity of our method. Our technique significantly enhances the navigational capabilities of a baseline VLM of approximately 50% in TPV scenario, and is comparable to trained models in the FPV one, with as few as 20 demonstrations.

Via

Access Paper or Ask Questions

CERES: Critical-Event Reconstruction via Temporal Scene Graph Completion

Oct 17, 2024

Efimia Panagiotaki, Georgi Pramatarov, Lars Kunze, Daniele De Martini

Figure 1 for CERES: Critical-Event Reconstruction via Temporal Scene Graph Completion

Figure 2 for CERES: Critical-Event Reconstruction via Temporal Scene Graph Completion

Figure 3 for CERES: Critical-Event Reconstruction via Temporal Scene Graph Completion

Figure 4 for CERES: Critical-Event Reconstruction via Temporal Scene Graph Completion

Abstract:This paper proposes a method for on-demand scenario generation in simulation, grounded on real-world data. Evaluating the behaviour of Autonomous Vehicles (AVs) in both safety-critical and regular scenarios is essential for assessing their robustness before real-world deployment. By integrating scenarios derived from real-world datasets into the simulation, we enhance the plausibility and validity of testing sets. This work introduces a novel approach that employs temporal scene graphs to capture evolving spatiotemporal relationships among scene entities from a real-world dataset, enabling the generation of dynamic scenarios in simulation through Graph Neural Networks (GNNs). User-defined action and criticality conditioning are used to ensure flexible, tailored scenario creation. Our model significantly outperforms the benchmarks in accurately predicting links corresponding to the requested scenarios. We further evaluate the validity and compatibility of our generated scenarios in an off-the-shelf simulator.

* 7 pages, 8 figures

Via

Access Paper or Ask Questions

**NAR-*ICP: Neural Execution of Classical ICP-based Pointcloud Registration Algorithms**

Oct 14, 2024

Efimia Panagiotaki, Daniele De Martini, Lars Kunze, Petar Veličković

Figure 1 for NAR-*ICP: Neural Execution of Classical ICP-based Pointcloud Registration Algorithms

Figure 2 for NAR-*ICP: Neural Execution of Classical ICP-based Pointcloud Registration Algorithms

Figure 3 for NAR-*ICP: Neural Execution of Classical ICP-based Pointcloud Registration Algorithms

Figure 4 for NAR-*ICP: Neural Execution of Classical ICP-based Pointcloud Registration Algorithms

Abstract:This study explores the intersection of neural networks and classical robotics algorithms through the Neural Algorithmic Reasoning (NAR) framework, allowing to train neural networks to effectively reason like classical robotics algorithms by learning to execute them. Algorithms are integral to robotics and safety-critical applications due to their predictable and consistent performance through logical and mathematical principles. In contrast, while neural networks are highly adaptable, handling complex, high-dimensional data and generalising across tasks, they often lack interpretability and transparency in their internal computations. We propose a Graph Neural Network (GNN)-based learning framework, NAR-*ICP, which learns the intermediate algorithmic steps of classical ICP-based pointcloud registration algorithms, and extend the CLRS Algorithmic Reasoning Benchmark with classical robotics perception algorithms. We evaluate our approach across diverse datasets, from real-world to synthetic, demonstrating its flexibility in handling complex and noisy inputs, along with its potential to be used as part of a larger learning system. Our results indicate that our method achieves superior performance across all benchmarks and datasets, consistently surpassing even the algorithms it has been trained on, further demonstrating its ability to generalise beyond the capabilities of traditional algorithms.

* 17 pages, 9 figures

Via

Access Paper or Ask Questions

NeRFoot: Robot-Footprint Estimation for Image-Based Visual Servoing

Aug 02, 2024

Daoxin Zhong, Luke Robinson, Daniele De Martini

Abstract:This paper investigates the utility of Neural Radiance Fields (NeRF) models in extending the regions of operation of a mobile robot, controlled by Image-Based Visual Servoing (IBVS) via static CCTV cameras. Using NeRF as a 3D-representation prior, the robot's footprint may be extrapolated geometrically and used to train a CNN-based network to extract it online from the robot's appearance alone. The resulting footprint results in a tighter bound than a robot-wide bounding box, allowing the robot's controller to prescribe more optimal trajectories and expand its safe operational floor area.

Via

Access Paper or Ask Questions

Watching Grass Grow: Long-term Visual Navigation and Mission Planning for Autonomous Biodiversity Monitoring

Apr 16, 2024

Matthew Gadd, Daniele De Martini, Luke Pitt, Wayne Tubby, Matthew Towlson, Chris Prahacs, Oliver Bartlett, John Jackson, Man Qi, Paul Newman(+3 more)

Abstract:We describe a challenging robotics deployment in a complex ecosystem to monitor a rich plant community. The study site is dominated by dynamic grassland vegetation and is thus visually ambiguous and liable to drastic appearance change over the course of a day and especially through the growing season. This dynamism and complexity in appearance seriously impact the stability of the robotics platform, as localisation is a foundational part of that control loop, and so routes must be carefully taught and retaught until autonomy is robust and repeatable. Our system is demonstrated over a 6-week period monitoring the response of grass species to experimental climate change manipulations. We also discuss the applicability of our pipeline to monitor biodiversity in other complex natural settings.

* to be presented at the Workshop on Field Robotics - ICRA 2024

Via

Access Paper or Ask Questions

VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition

Mar 14, 2024

Benjamin Ramtoula, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract:This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation. Two parallel lines of work on VPR have shown, on one side, that general-purpose off-the-shelf feature representations can provide robustness to domain shifts, and, on the other, that fused information from sequences of images improves performance. In our recent work on measuring domain gaps between image datasets, we proposed a Visual Distribution of Neuron Activations (VDNA) representation to represent datasets of images. This representation can naturally handle image sequences and provides a general and granular feature representation derived from a general-purpose model. Moreover, our representation is based on tracking neuron activation values over the list of images to represent and is not limited to a particular neural network layer, therefore having access to high- and low-level concepts. This work shows how VDNAs can be used for VPR by learning a very lightweight and simple encoder to generate task-specific descriptors. Our experiments show that our representation can allow for better robustness than current solutions to serious domain shifts away from the training data distribution, such as to indoor environments and aerial imagery.

* Published at ICRA 2024

Via

Access Paper or Ask Questions

RobotCycle: Assessing Cycling Safety in Urban Environments

Mar 12, 2024

Efimia Panagiotaki, Tyler Reinmund, Brian Liu, Stephan Mouton, Luke Pitt, Arundathi Shaji Shanthini, Matthew Towlson, Wayne Tubby, Chris Prahacs, Daniele De Martini(+1 more)

Figure 1 for RobotCycle: Assessing Cycling Safety in Urban Environments

Figure 2 for RobotCycle: Assessing Cycling Safety in Urban Environments

Figure 3 for RobotCycle: Assessing Cycling Safety in Urban Environments

Figure 4 for RobotCycle: Assessing Cycling Safety in Urban Environments

Abstract:This paper introduces RobotCycle, a novel ongoing project that leverages Autonomous Vehicle (AV) research to investigate how cycling infrastructure influences cyclist behaviour and safety during real-world journeys. The project's requirements were defined in collaboration with key stakeholders (i.e. city planners, cyclists, and policymakers), informing the design of risk and safety metrics and the data collection criteria. We propose a data-driven approach relying on a novel, rich dataset of diverse traffic scenes captured through a custom-designed wearable sensing unit. We extract road-user trajectories and analyse deviations suggesting risk or potentially hazardous interactions in correlation with infrastructural elements in the environment. Driving profiles and trajectory patterns are associated with local road segments, driving conditions, and road-user interactions to predict traffic behaviour and identify critical scenarios. Moreover, leveraging advancements in AV research, the project extracts detailed 3D maps, traffic flow patterns, and trajectory models to provide an in-depth assessment and analysis of the behaviour of all traffic agents. This data can then inform the design of cyclist-friendly road infrastructure, improving road safety and cyclability, as it provides valuable insights for enhancing cyclist protection and promoting sustainable urban mobility.

* 6 pages, 7 figures

Via

Access Paper or Ask Questions