Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Brendan Tidd

Scalable Multi-Objective Robot Reinforcement Learning through Gradient Conflict Resolution

Sep 18, 2025

Humphrey Munn, Brendan Tidd, Peter Böhm, Marcus Gallagher, David Howard

Abstract:Reinforcement Learning (RL) robot controllers usually aggregate many task objectives into one scalar reward. While large-scale proximal policy optimisation (PPO) has enabled impressive results such as robust robot locomotion in the real world, many tasks still require careful reward tuning and are brittle to local optima. Tuning cost and sub-optimality grow with the number of objectives, limiting scalability. Modelling reward vectors and their trade-offs can address these issues; however, multi-objective methods remain underused in RL for robotics because of computational cost and optimisation difficulty. In this work, we investigate the conflict between gradient contributions for each objective that emerge from scalarising the task objectives. In particular, we explicitly address the conflict between task-based rewards and terms that regularise the policy towards realistic behaviour. We propose GCR-PPO, a modification to actor-critic optimisation that decomposes the actor update into objective-wise gradients using a multi-headed critic and resolves conflicts based on the objective priority. Our methodology, GCR-PPO, is evaluated on the well-known IsaacLab manipulation and locomotion benchmarks and additional multi-objective modifications on two related tasks. We show superior scalability compared to parallel PPO (p = 0.04), without significant computational overhead. We also show higher performance with more conflicting tasks. GCR-PPO improves on large-scale PPO with an average improvement of 9.5%, with high-conflict tasks observing a greater improvement. The code is available at https://github.com/humphreymunn/GCR-PPO.

Via

Access Paper or Ask Questions

Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations

Jul 30, 2025

Yifei Chen, Yuzhe Zhang, Giovanni D'urso, Nicholas Lawrance, Brendan Tidd

Abstract:Recent developments in imitation learning have considerably advanced robotic manipulation. However, current techniques in imitation learning can suffer from poor generalization, limiting performance even under relatively minor domain shifts. In this work, we aim to enhance the generalization capabilities of complex imitation learning algorithms to handle unpredictable changes from the training environments to deployment environments. To avoid confusion caused by observations that are not relevant to the target task, we propose to explicitly learn the causal relationship between observation components and expert actions, employing a framework similar to [6], where a causal structural function is learned by intervention on the imitation learning policy. Disentangling the feature representation from image input as in [6] is hard to satisfy in complex imitation learning process in robotic manipulation, we theoretically clarify that this requirement is not necessary in causal relationship learning. Therefore, we propose a simple causal structure learning framework that can be easily embedded in recent imitation learning architectures, such as the Action Chunking Transformer [31]. We demonstrate our approach using a simulation of the ALOHA [31] bimanual robot arms in Mujoco, and show that the method can considerably mitigate the generalization problem of existing complex imitation learning algorithms.

* 13 pages

Via

Access Paper or Ask Questions

Whole-Body Dynamic Throwing with Legged Manipulators

Oct 08, 2024

Humphrey Munn, Brendan Tidd, David Howard, Marcus Gallagher

Figure 1 for Whole-Body Dynamic Throwing with Legged Manipulators

Figure 2 for Whole-Body Dynamic Throwing with Legged Manipulators

Figure 3 for Whole-Body Dynamic Throwing with Legged Manipulators

Figure 4 for Whole-Body Dynamic Throwing with Legged Manipulators

Abstract:Most robotic behaviours focus on either manipulation or locomotion, where tasks that require the integration of both, such as full-body throwing, remain under-explored. Throwing with a robot involves complex coordination between object manipulation and legged locomotion, which is crucial for advanced real-world interactions. This work investigates the challenge of full-body throwing in robotic systems and highlights the advantages of utilising the robot's entire body. We propose a deep reinforcement learning (RL) approach that leverages the robot's body to enhance throwing performance through a strategically designed curriculum to avoid local optima and sparse but informative reward functions to improve policy flexibility. The robot's body learns to generate additional momentum and fine-tune the projectile release velocity. Our full-body method achieves on average 47% greater throwing distance and 34% greater throwing accuracy than the arm alone, across two robot morphologies - an armed quadruped and a humanoid. We also extend our method to optimise robot stability during throws. The learned policy effectively generalises throwing to targets at any 3D point in space within a specified range, which has not previously been achieved and does so with human-level throwing accuracy. We successfully transferred this approach from simulation to a real robot using sim2real techniques, demonstrating its practical viability.

Via

Access Paper or Ask Questions

Alternative Interfaces for Human-initiated Natural Language Communication and Robot-initiated Haptic Feedback: Towards Better Situational Awareness in Human-Robot Collaboration

Jan 25, 2024

Callum Bennie, Bridget Casey, Cecile Paris, Dana Kulic, Brendan Tidd, Nicholas Lawrance, Alex Pitt, Fletcher Talbot, Jason Williams, David Howard(+2 more)

Abstract:This article presents an implementation of a natural-language speech interface and a haptic feedback interface that enables a human supervisor to provide guidance to, request information, and receive status updates from a Spot robot. We provide insights gained during preliminary user testing of the interface in a realistic robot exploration scenario.

* Peer reviewed and published at "Empowering People in Human-Robot Collaboration: Why, How, When, and for Whom" workshop at OzCHI 2023 conference

Via

Access Paper or Ask Questions

Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review

Jul 28, 2023

Nicole Robinson, Brendan Tidd, Dylan Campbell, Dana Kulić, Peter Corke

Figure 1 for Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review

Figure 2 for Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review

Figure 3 for Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review

Figure 4 for Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review

Abstract:Robotic vision for human-robot interaction and collaboration is a critical process for robots to collect and interpret detailed information related to human actions, goals, and preferences, enabling robots to provide more useful services to people. This survey and systematic review presents a comprehensive analysis on robotic vision in human-robot interaction and collaboration over the last 10 years. From a detailed search of 3850 articles, systematic extraction and evaluation was used to identify and explore 310 papers in depth. These papers described robots with some level of autonomy using robotic vision for locomotion, manipulation and/or visual communication to collaborate or interact with people. This paper provides an in-depth analysis of current trends, common domains, methods and procedures, technical processes, data sets and models, experimental testing, sample populations, performance metrics and future challenges. This manuscript found that robotic vision was often used in action and gesture recognition, robot movement in human spaces, object handover and collaborative actions, social communication and learning from demonstration. Few high-impact and novel techniques from the computer vision field had been translated into human-robot interaction and collaboration. Overall, notable advancements have been made on how to develop and deploy robots to assist people.

* ACM Transactions on Human-Robot Interaction (2023) Volume 12 Issue 1 Article No 12 pp 1-66

Via

Access Paper or Ask Questions

Learning Visuo-Motor Behaviours for Robot Locomotion Over Difficult Terrain

Mar 02, 2023

Brendan Tidd

Figure 1 for Learning Visuo-Motor Behaviours for Robot Locomotion Over Difficult Terrain

Figure 2 for Learning Visuo-Motor Behaviours for Robot Locomotion Over Difficult Terrain

Figure 3 for Learning Visuo-Motor Behaviours for Robot Locomotion Over Difficult Terrain

Figure 4 for Learning Visuo-Motor Behaviours for Robot Locomotion Over Difficult Terrain

Abstract:As mobile robots become useful performing everyday tasks in complex real-world environments, they must be able to traverse a range of difficult terrain types such as stairs, stepping stones, gaps, jumps and narrow passages. This work investigated traversing these types of environments with a bipedal robot (simulation experiments), and a tracked robot (real world). Developing a traditional monolithic controller for traversing all terrain types is challenging, and for large physical robots realistic test facilities are required and safety must be ensured. An alternative is a suite of simple behaviour controllers that can be composed to achieve complex tasks. This work efficiently trained complex behaviours to enable mobile robots to traverse difficult terrain. By minimising retraining as new behaviours became available, robots were able to traverse increasingly complex terrain sets, leading toward the development of scalable behaviour libraries.

* PhD thesis

Via

Access Paper or Ask Questions

Heterogeneous robot teams with unified perception and autonomy: How Team CSIRO Data61 tied for the top score at the DARPA Subterranean Challenge

Feb 26, 2023

Navinda Kottege, Jason Williams, Brendan Tidd, Fletcher Talbot, Ryan Steindl, Mark Cox, Dennis Frousheger, Thomas Hines, Alex Pitt, Benjamin Tam(+20 more)

Figure 1 for Heterogeneous robot teams with unified perception and autonomy: How Team CSIRO Data61 tied for the top score at the DARPA Subterranean Challenge

Figure 2 for Heterogeneous robot teams with unified perception and autonomy: How Team CSIRO Data61 tied for the top score at the DARPA Subterranean Challenge

Figure 3 for Heterogeneous robot teams with unified perception and autonomy: How Team CSIRO Data61 tied for the top score at the DARPA Subterranean Challenge

Figure 4 for Heterogeneous robot teams with unified perception and autonomy: How Team CSIRO Data61 tied for the top score at the DARPA Subterranean Challenge

Abstract:The DARPA Subterranean Challenge was designed for competitors to develop and deploy teams of autonomous robots to explore difficult unknown underground environments. Categorised in to human-made tunnels, underground urban infrastructure and natural caves, each of these subdomains had many challenging elements for robot perception, locomotion, navigation and autonomy. These included degraded wireless communication, poor visibility due to smoke, narrow passages and doorways, clutter, uneven ground, slippery and loose terrain, stairs, ledges, overhangs, dripping water, and dynamic obstacles that move to block paths among others. In the Final Event of this challenge held in September 2021, the course consisted of all three subdomains. The task was for the robot team to perform a scavenger hunt for a number of pre-defined artefacts within a limited time frame. Only one human supervisor was allowed to communicate with the robots once they were in the course. Points were scored when accurate detections and their locations were communicated back to the scoring server. A total of 8 teams competed in the finals held at the Mega Cavern in Louisville, KY, USA. This article describes the systems deployed by Team CSIRO Data61 that tied for the top score and won second place at the event.

Via

Access Paper or Ask Questions

Human-Robot Team Performance Compared to Full Robot Autonomy in 16 Real-World Search and Rescue Missions: Adaptation of the DARPA Subterranean Challenge

Dec 11, 2022

Nicole Robinson, Jason Williams, David Howard, Brendan Tidd, Fletcher Talbot, Brett Wood, Alex Pitt, Navinda Kottege, Dana Kulić

Figure 1 for Human-Robot Team Performance Compared to Full Robot Autonomy in 16 Real-World Search and Rescue Missions: Adaptation of the DARPA Subterranean Challenge

Figure 2 for Human-Robot Team Performance Compared to Full Robot Autonomy in 16 Real-World Search and Rescue Missions: Adaptation of the DARPA Subterranean Challenge

Figure 3 for Human-Robot Team Performance Compared to Full Robot Autonomy in 16 Real-World Search and Rescue Missions: Adaptation of the DARPA Subterranean Challenge

Figure 4 for Human-Robot Team Performance Compared to Full Robot Autonomy in 16 Real-World Search and Rescue Missions: Adaptation of the DARPA Subterranean Challenge

Abstract:Human operators in human-robot teams are commonly perceived to be critical for mission success. To explore the direct and perceived impact of operator input on task success and team performance, 16 real-world missions (10 hrs) were conducted based on the DARPA Subterranean Challenge. These missions were to deploy a heterogeneous team of robots for a search task to locate and identify artifacts such as climbing rope, drills and mannequins representing human survivors. Two conditions were evaluated: human operators that could control the robot team with state-of-the-art autonomy (Human-Robot Team) compared to autonomous missions without human operator input (Robot-Autonomy). Human-Robot Teams were often in directed autonomy mode (70% of mission time), found more items, traversed more distance, covered more unique ground, and had a higher time between safety-related events. Human-Robot Teams were faster at finding the first artifact, but slower to respond to information from the robot team. In routine conditions, scores were comparable for artifacts, distance, and coverage. Reasons for intervention included creating waypoints to prioritise high-yield areas, and to navigate through error-prone spaces. After observing robot autonomy, operators reported increases in robot competency and trust, but that robot behaviour was not always transparent and understandable, even after high mission performance.

* Submitted to Transactions on Human-Robot Interaction

Via

Access Paper or Ask Questions

Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Nov 04, 2022

Krishan Rana, Ming Xu, Brendan Tidd, Michael Milford, Niko Sünderhauf

Figure 1 for Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Figure 2 for Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Figure 3 for Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Figure 4 for Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Abstract:Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. Skills are typically extracted from expert demonstrations and are embedded into a latent space from which they can be sampled as actions by a high-level RL agent. However, this skill space is expansive, and not all skills are relevant for a given robot state, making exploration difficult. Furthermore, the downstream RL agent is limited to learning structurally similar tasks to those used to construct the skill space. We firstly propose accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience. Next, we propose a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations. Finally, we validate our approach across four challenging manipulation tasks that differ from those used to build the skill space, demonstrating our ability to learn across task variations while significantly accelerating exploration, outperforming prior works. Code and videos are available on our project website: https://krishanrana.github.io/reskill.

* 6th Conference on Robot Learning (CoRL), 2022

Via

Access Paper or Ask Questions

Heterogeneous Ground and Air Platforms, Homogeneous Sensing: Team CSIRO Data61's Approach to the DARPA Subterranean Challenge

Apr 19, 2021

Nicolas Hudson, Fletcher Talbot, Mark Cox, Jason Williams, Thomas Hines, Alex Pitt, Brett Wood, Dennis Frousheger, Katrina Lo Surdo, Thomas Molnar(+24 more)

Figure 1 for Heterogeneous Ground and Air Platforms, Homogeneous Sensing: Team CSIRO Data61's Approach to the DARPA Subterranean Challenge

Figure 2 for Heterogeneous Ground and Air Platforms, Homogeneous Sensing: Team CSIRO Data61's Approach to the DARPA Subterranean Challenge

Figure 3 for Heterogeneous Ground and Air Platforms, Homogeneous Sensing: Team CSIRO Data61's Approach to the DARPA Subterranean Challenge

Figure 4 for Heterogeneous Ground and Air Platforms, Homogeneous Sensing: Team CSIRO Data61's Approach to the DARPA Subterranean Challenge

Abstract:Heterogeneous teams of robots, leveraging a balance between autonomy and human interaction, bring powerful capabilities to the problem of exploring dangerous, unstructured subterranean environments. Here we describe the solution developed by Team CSIRO Data61, consisting of CSIRO, Emesent and Georgia Tech, during the DARPA Subterranean Challenge. These presented systems were fielded in the Tunnel Circuit in August 2019, the Urban Circuit in February 2020, and in our own Cave event, conducted in September 2020. A unique capability of the fielded team is the homogeneous sensing of the platforms utilised, which is leveraged to obtain a decentralised multi-agent SLAM solution on each platform (both ground agents and UAVs) using peer-to-peer communications. This enabled a shift in focus from constructing a pervasive communications network to relying on multi-agent autonomy, motivated by experiences in early circuit events. These experiences also showed the surprising capability of rugged tracked platforms for challenging terrain, which in turn led to the heterogeneous team structure based on a BIA5 OzBot Titan ground robot and an Emesent Hovermap UAV, supplemented by smaller tracked or legged ground robots. The ground agents use a common CatPack perception module, which allowed reuse of the perception and autonomy stack across all ground agents with minimal adaptation.

Via

Access Paper or Ask Questions