Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Jun 20, 2024

Jiaming Zhou, Teli Ma, Kun-Yu Lin, Ronghe Qiu, Zifan Wang, Junwei Liang

Figure 1 for Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Figure 2 for Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Figure 3 for Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Figure 4 for Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Share this with someone who'll enjoy it:

Abstract:Learning generalizable visual dynamic representation across different embodied environments is crucial for real-world robotic manipulation. As the scale and diversity of robot demonstration data are limited, recent works have turned to large-scale pre-training using human data. However, the morphological differences between humans and robots introduce a significant human-robot domain discrepancy, challenging the generalization of these human-data pre-trained models to downstream manipulation tasks. To address this, we propose a novel adaptation paradigm that utilizes readily available paired human-robot video data to bridge the discrepancy. Following this paradigm, our method exploits a human-robot contrastive alignment loss to align the semantics of human and robot videos, adapting pre-trained models to the robotic domain in a parameter-efficient manner. The experiments demonstrate significant improvements on 25 tasks across three different benchmarks, where the single-task, language-conditioned multi-task settings are covered, and two different pre-trained models are evaluated. On the large RLBench benchmark, our adaptation method achieves an average improvement of $8.9\%$ in success rate over the pre-trained R3M model across multiple tasks. We will release the code and models upon acceptance.

View paper on

Share this with someone who'll enjoy it:

Title:Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Paper and Code