Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gregory D. Hager

Adapting Image-based RL Policies via Predicted Rewards

Jul 23, 2024

Weiyao Wang, Xinyuan Fang, Gregory D. Hager

Abstract:Image-based reinforcement learning (RL) faces significant challenges in generalization when the visual environment undergoes substantial changes between training and deployment. Under such circumstances, learned policies may not perform well leading to degraded results. Previous approaches to this problem have largely focused on broadening the training observation distribution, employing techniques like data augmentation and domain randomization. However, given the sequential nature of the RL decision-making problem, it is often the case that residual errors are propagated by the learned policy model and accumulate throughout the trajectory, resulting in highly degraded performance. In this paper, we leverage the observation that predicted rewards under domain shift, even though imperfect, can still be a useful signal to guide fine-tuning. We exploit this property to fine-tune a policy using reward prediction in the target domain. We have found that, even under significant domain shift, the predicted reward can still provide meaningful signal and fine-tuning substantially improves the original policy. Our approach, termed Predicted Reward Fine-tuning (PRFT), improves performance across diverse tasks in both simulated benchmarks and real-world experiments. More information is available at project web page: https://sites.google.com/view/prft.

* L4DC 2024

Via

Access Paper or Ask Questions

Domain Adaptation of Visual Policies with a Single Demonstration

Jul 23, 2024

Weiyao Wang, Gregory D. Hager

Abstract:Deploying machine learning algorithms for robot tasks in real-world applications presents a core challenge: overcoming the domain gap between the training and the deployment environment. This is particularly difficult for visuomotor policies that utilize high-dimensional images as input, particularly when those images are generated via simulation. A common method to tackle this issue is through domain randomization, which aims to broaden the span of the training distribution to cover the test-time distribution. However, this approach is only effective when the domain randomization encompasses the actual shifts in the test-time distribution. We take a different approach, where we make use of a single demonstration (a prompt) to learn policy that adapts to the testing target environment. Our proposed framework, PromptAdapt, leverages the Transformer architecture's capacity to model sequential data to learn demonstration-conditioned visual policies, allowing for in-context adaptation to a target domain that is distinct from training. Our experiments in both simulation and real-world settings show that PromptAdapt is a strong domain-adapting policy that outperforms baseline methods by a large margin under a range of domain shifts, including variations in lighting, color, texture, and camera pose. Videos and more information can be viewed at project webpage: https://sites.google.com/view/promptadapt.

* ICRA 2024

Via

Access Paper or Ask Questions

VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation

Mar 19, 2024

Weiyao Wang, Yutian Lei, Shiyu Jin, Gregory D. Hager, Liangjun Zhang

Abstract:In this work, we introduce the Virtual In-Hand Eye Transformer (VIHE), a novel method designed to enhance 3D manipulation capabilities through action-aware view rendering. VIHE autoregressively refines actions in multiple stages by conditioning on rendered views posed from action predictions in the earlier stages. These virtual in-hand views provide a strong inductive bias for effectively recognizing the correct pose for the hand, especially for challenging high-precision tasks such as peg insertion. On 18 manipulation tasks in RLBench simulated environments, VIHE achieves a new state-of-the-art, with a 12% absolute improvement, increasing from 65% to 77% over the existing state-of-the-art model using 100 demonstrations per task. In real-world scenarios, VIHE can learn manipulation tasks with just a handful of demonstrations, highlighting its practical utility. Videos and code implementation can be found at our project site: https://vihe-3d.github.io.

Via

Access Paper or Ask Questions

The Quiet Eye Phenomenon in Minimally Invasive Surgery

Sep 06, 2023

Alaa Eldin Abdelaal, Rachelle Van Rumpt, Sayem Nazmuz Zaman, Irene Tong, Anthony Jarc, Gary L. Gallia, Masaru Ishii, Gregory D. Hager, Septimiu E. Salcudean

Abstract:In this paper, we report our discovery of a gaze behavior called Quiet Eye (QE) in minimally invasive surgery. The QE behavior has been extensively studied in sports training and has been associated with higher level of expertise in multiple sports. We investigated the QE behavior in two independently collected data sets of surgeons performing tasks in a sinus surgery setting and a robotic surgery setting, respectively. Our results show that the QE behavior is more likely to occur in successful task executions and in performances of surgeons of high level of expertise. These results open the door to use the QE behavior in both training and skill assessment in minimally invasive surgery.

Via

Access Paper or Ask Questions

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Feb 22, 2022

Xingtong Liu, Zhaoshuo Li, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath

Figure 1 for SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Figure 2 for SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Figure 3 for SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Figure 4 for SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Abstract:In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllppl920/SAGE-SLAM.git.

* Accepted to ICRA 2022

Via

Access Paper or Ask Questions

Learn Proportional Derivative Controllable Latent Space from Pixels

Oct 15, 2021

Weiyao Wang, Marin Kobilarov, Gregory D. Hager

Figure 1 for Learn Proportional Derivative Controllable Latent Space from Pixels

Figure 2 for Learn Proportional Derivative Controllable Latent Space from Pixels

Figure 3 for Learn Proportional Derivative Controllable Latent Space from Pixels

Figure 4 for Learn Proportional Derivative Controllable Latent Space from Pixels

Abstract:Recent advances in latent space dynamics model from pixels show promising progress in vision-based model predictive control (MPC). However, executing MPC in real time can be challenging due to its intensive computational cost in each timestep. We propose to introduce additional learning objectives to enforce that the learned latent space is proportional derivative controllable. In execution time, the simple PD-controller can be applied directly to the latent space encoded from pixels, to produce simple and effective control to systems with visual observations. We show that our method outperforms baseline methods to produce robust goal reaching and trajectory tracking in various environments.

Via

Access Paper or Ask Questions

Localization and Control of Magnetic Suture Needles in Cluttered Surgical Site with Blood and Tissue

May 20, 2021

Will Pryor, Yotam Barnoy, Suraj Raval, Xiaolong Liu, Lamar Mair, Daniel Lerner, Onder Erin, Gregory D. Hager, Yancy Diaz-Mercado, Axel Krieger

Figure 1 for Localization and Control of Magnetic Suture Needles in Cluttered Surgical Site with Blood and Tissue

Figure 2 for Localization and Control of Magnetic Suture Needles in Cluttered Surgical Site with Blood and Tissue

Figure 3 for Localization and Control of Magnetic Suture Needles in Cluttered Surgical Site with Blood and Tissue

Figure 4 for Localization and Control of Magnetic Suture Needles in Cluttered Surgical Site with Blood and Tissue

Abstract:Real-time visual localization of needles is necessary for various surgical applications, including surgical automation and visual feedback. In this study we investigate localization and autonomous robotic control of needles in the context of our magneto-suturing system. Our system holds the potential for surgical manipulation with the benefit of minimal invasiveness and reduced patient side effects. However, the non-linear magnetic fields produce unintuitive forces and demand delicate position-based control that exceeds the capabilities of direct human manipulation. This makes automatic needle localization a necessity. Our localization method combines neural network-based segmentation and classical techniques, and we are able to consistently locate our needle with 0.73 mm RMS error in clean environments and 2.72 mm RMS error in challenging environments with blood and occlusion. The average localization RMS error is 2.16 mm for all environments we used in the experiments. We combine this localization method with our closed-loop feedback control system to demonstrate the further applicability of localization to autonomous control. Our needle is able to follow a running suture path in (1) no blood, no tissue; (2) heavy blood, no tissue; (3) no blood, with tissue; and (4) heavy blood, with tissue environments. The tip position tracking error ranges from 2.6 mm to 3.7 mm RMS, opening the door towards autonomous suturing tasks.

Via

Access Paper or Ask Questions

Single View Geocentric Pose in the Wild

May 18, 2021

Gordon Christie, Kevin Foster, Shea Hagstrom, Gregory D. Hager, Myron Z. Brown

Figure 1 for Single View Geocentric Pose in the Wild

Figure 2 for Single View Geocentric Pose in the Wild

Figure 3 for Single View Geocentric Pose in the Wild

Figure 4 for Single View Geocentric Pose in the Wild

Abstract:Current methods for Earth observation tasks such as semantic mapping, map alignment, and change detection rely on near-nadir images; however, often the first available images in response to dynamic world events such as natural disasters are oblique. These tasks are much more difficult for oblique images due to observed object parallax. There has been recent success in learning to regress geocentric pose, defined as height above ground and orientation with respect to gravity, by training with airborne lidar registered to satellite images. We present a model for this novel task that exploits affine invariance properties to outperform state of the art performance by a wide margin. We also address practical issues required to deploy this method in the wild for real-world applications. Our data and code are publicly available.

* To be published in the proceedings of the CVPR 2021 EarthVision Workshop

Via

Access Paper or Ask Questions

Out-of-Distribution Robustness with Deep Recursive Filters

Apr 06, 2021

Kapil D. Katyal, I-Jeng Wang, Gregory D. Hager

Figure 1 for Out-of-Distribution Robustness with Deep Recursive Filters

Figure 2 for Out-of-Distribution Robustness with Deep Recursive Filters

Figure 3 for Out-of-Distribution Robustness with Deep Recursive Filters

Figure 4 for Out-of-Distribution Robustness with Deep Recursive Filters

Abstract:Accurate state and uncertainty estimation is imperative for mobile robots and self driving vehicles to achieve safe navigation in pedestrian rich environments. A critical component of state and uncertainty estimation for robot navigation is to perform robustly under out-of-distribution noise. Traditional methods of state estimation decouple perception and state estimation making it difficult to operate on noisy, high dimensional data. Here, we describe an approach that combines the expressiveness of deep neural networks with principled approaches to uncertainty estimation found in recursive filters. We particularly focus on techniques that provide better robustness to out-of-distribution noise and demonstrate applicability of our approach on two scenarios: a simple noisy pendulum state estimation problem and real world pedestrian localization using the nuScenes dataset. We show that our approach improves state and uncertainty estimation compared to baselines while achieving approximately 3x improvement in computational efficiency.

Via

Access Paper or Ask Questions

Motion Guided Attention Fusion to Recognize Interactions from Videos

Apr 01, 2021

Tae Soo Kim, Jonathan Jones, Gregory D. Hager

Figure 1 for Motion Guided Attention Fusion to Recognize Interactions from Videos

Figure 2 for Motion Guided Attention Fusion to Recognize Interactions from Videos

Figure 3 for Motion Guided Attention Fusion to Recognize Interactions from Videos

Figure 4 for Motion Guided Attention Fusion to Recognize Interactions from Videos

Abstract:We present a dual-pathway approach for recognizing fine-grained interactions from videos. We build on the success of prior dual-stream approaches, but make a distinction between the static and dynamic representations of objects and their interactions explicit by introducing separate motion and object detection pathways. Then, using our new Motion-Guided Attention Fusion module, we fuse the bottom-up features in the motion pathway with features captured from object detections to learn the temporal aspects of an action. We show that our approach can generalize across appearance effectively and recognize actions where an actor interacts with previously unseen objects. We validate our approach using the compositional action recognition task from the Something-Something-v2 dataset where we outperform existing state-of-the-art methods. We also show that our method can generalize well to real world tasks by showing state-of-the-art performance on recognizing humans assembling various IKEA furniture on the IKEA-ASM dataset.

Via

Access Paper or Ask Questions