Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianing Qian

Task-Oriented Hierarchical Object Decomposition for Visuomotor Control

Nov 02, 2024

Jianing Qian, Yunshuang Li, Bernadette Bucher, Dinesh Jayaraman

Abstract:Good pre-trained visual representations could enable robots to learn visuomotor policy efficiently. Still, existing representations take a one-size-fits-all-tasks approach that comes with two important drawbacks: (1) Being completely task-agnostic, these representations cannot effectively ignore any task-irrelevant information in the scene, and (2) They often lack the representational capacity to handle unconstrained/complex real-world scenes. Instead, we propose to train a large combinatorial family of representations organized by scene entities: objects and object parts. This hierarchical object decomposition for task-oriented representations (HODOR) permits selectively assembling different representations specific to each task while scaling in representational capacity with the complexity of the scene and the task. In our experiments, we find that HODOR outperforms prior pre-trained representations, both scene vector representations and object-centric representations, for sample-efficient imitation learning across 5 simulated and 5 real-world manipulation tasks. We further find that the invariances captured in HODOR are inherited into downstream policies, which can robustly generalize to out-of-distribution test conditions, permitting zero-shot skill chaining. Appendix, code, and videos: https://sites.google.com/view/hodor-corl24.

* CoRL 2024

Via

Access Paper or Ask Questions

Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

May 24, 2024

Jianing Qian, Anastasios Panagopoulos, Dinesh Jayaraman

Figure 1 for Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

Figure 2 for Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

Figure 3 for Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

Figure 4 for Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

Abstract:Generic re-usable pre-trained image representation encoders have become a standard component of methods for many computer vision tasks. As visual representations for robots however, their utility has been limited, leading to a recent wave of efforts to pre-train robotics-specific image encoders that are better suited to robotic tasks than their generic counterparts. We propose Scene Objects From Transformers, abbreviated as SOFT, a wrapper around pre-trained vision transformer (PVT) models that bridges this gap without any further training. Rather than construct representations out of only the final layer activations, SOFT individuates and locates object-like entities from PVT attentions, and describes them with PVT activations, producing an object-centric embedding. Across standard choices of generic pre-trained vision transformers PVT, we demonstrate in each case that policies trained on SOFT(PVT) far outstrip standard PVT representations for manipulation tasks in simulated and real settings, approaching the state-of-the-art robotics-aware representations. Code, appendix and videos: https://sites.google.com/view/robot-soft/

* Accepted to International Conference on Robotics and Automation(ICRA) 2024

Via

Access Paper or Ask Questions

Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models

Apr 20, 2024

Junyao Shi, Jianing Qian, Yecheng Jason Ma, Dinesh Jayaraman

Abstract:There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images. To leverage these for improved robot learning, we propose $\textbf{POCR}$, a new framework for building pre-trained object-centric representations for robotic control. Building on theories of "what-where" representations in psychology and computer vision, we use segmentations from a pre-trained model to stably locate across timesteps, various entities in the scene, capturing "where" information. To each such segmented entity, we apply other pre-trained models that build vector descriptions suitable for robotic control tasks, thus capturing "what" the entity is. Thus, our pre-trained object-centric representations for control are constructed by appropriately combining the outputs of off-the-shelf pre-trained models, with no new training. On various simulated and real robotic tasks, we show that imitation policies for robotic manipulators trained on POCR achieve better performance and systematic generalization than state of the art pre-trained representations for robotics, as well as prior object-centric representations that are typically trained from scratch.

* ICRA 2024. Project website: https://sites.google.com/view/pocr

Via

Access Paper or Ask Questions

Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming

Jun 22, 2022

Chuan Wen, Jianing Qian, Jierui Lin, Jiaye Teng, Dinesh Jayaraman, Yang Gao

Figure 1 for Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming

Figure 2 for Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming

Figure 3 for Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming

Figure 4 for Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming

Abstract:Across applications spanning supervised classification and sequential control, deep learning has been reported to find "shortcut" solutions that fail catastrophically under minor changes in the data distribution. In this paper, we show empirically that DNNs can be coaxed to avoid poor shortcuts by providing an additional "priming" feature computed from key input features, usually a coarse output estimate. Priming relies on approximate domain knowledge of these task-relevant key input features, which is often easy to obtain in practical settings. For example, one might prioritize recent frames over past frames in a video input for visual imitation learning, or salient foreground over background pixels for image classification. On NICO image classification, MuJoCo continuous control, and CARLA autonomous driving, our priming strategy works significantly better than several popular state-of-the-art approaches for feature selection and data augmentation. We connect these empirical findings to recent theoretical results on DNN optimization, and argue theoretically that priming distracts the optimizer away from poor shortcuts by creating better, simpler shortcuts.

* 28 pages, 13 figures, ICML2022

Via

Access Paper or Ask Questions

Keyframe-Focused Visual Imitation Learning

Jun 11, 2021

Chuan Wen, Jierui Lin, Jianing Qian, Yang Gao, Dinesh Jayaraman

Figure 1 for Keyframe-Focused Visual Imitation Learning

Figure 2 for Keyframe-Focused Visual Imitation Learning

Figure 3 for Keyframe-Focused Visual Imitation Learning

Figure 4 for Keyframe-Focused Visual Imitation Learning

Abstract:Imitation learning trains control policies by mimicking pre-recorded expert demonstrations. In partially observable settings, imitation policies must rely on observation histories, but many seemingly paradoxical results show better performance for policies that only access the most recent observation. Recent solutions ranging from causal graph learning to deep information bottlenecks have shown promising results, but failed to scale to realistic settings such as visual imitation. We propose a solution that outperforms these prior approaches by upweighting demonstration keyframes corresponding to expert action changepoints. This simple approach easily scales to complex visual imitation settings. Our experimental results demonstrate consistent performance improvements over all baselines on image-based Gym MuJoCo continuous control tasks. Finally, on the CARLA photorealistic vision-based urban driving simulator, we resolve a long-standing issue in behavioral cloning for driving by demonstrating effective imitation from observation histories. Supplementary materials and code at: \url{https://tinyurl.com/imitation-keyframes}.

* 14 pages, 7 figures, ICML2021

Via

Access Paper or Ask Questions

Robust Instance Tracking via Uncertainty Flow

Oct 09, 2020

Jianing Qian, Junyu Nan, Siddharth Ancha, Brian Okorn, David Held

Figure 1 for Robust Instance Tracking via Uncertainty Flow

Figure 2 for Robust Instance Tracking via Uncertainty Flow

Figure 3 for Robust Instance Tracking via Uncertainty Flow

Figure 4 for Robust Instance Tracking via Uncertainty Flow

Abstract:Current state-of-the-art trackers often fail due to distractorsand large object appearance changes. In this work, we explore the use ofdense optical flow to improve tracking robustness. Our main insight is that, because flow estimation can also have errors, we need to incorporate an estimate of flow uncertainty for robust tracking. We present a novel tracking framework which combines appearance and flow uncertainty information to track objects in challenging scenarios. We experimentally verify that our framework improves tracking robustness, leading to new state-of-the-art results. Further, our experimental ablations shows the importance of flow uncertainty for robust tracking.

Via

Access Paper or Ask Questions

Cloth Region Segmentation for Robust Grasp Selection

Aug 13, 2020

Jianing Qian, Thomas Weng, Luxin Zhang, Brian Okorn, David Held

Figure 1 for Cloth Region Segmentation for Robust Grasp Selection

Figure 2 for Cloth Region Segmentation for Robust Grasp Selection

Figure 3 for Cloth Region Segmentation for Robust Grasp Selection

Figure 4 for Cloth Region Segmentation for Robust Grasp Selection

Abstract:Cloth detection and manipulation is a common task in domestic and industrial settings, yet such tasks remain a challenge for robots due to cloth deformability. Furthermore, in many cloth-related tasks like laundry folding and bed making, it is crucial to manipulate specific regions like edges and corners, as opposed to folds. In this work, we focus on the problem of segmenting and grasping these key regions. Our approach trains a network to segment the edges and corners of a cloth from a depth image, distinguishing such regions from wrinkles or folds. We also provide a novel algorithm for estimating the grasp location, direction, and directional uncertainty from the segmentation. We demonstrate our method on a real robot system and show that it outperforms baseline methods on grasping success. Video and other supplementary materials are available at: https://sites.google.com/view/cloth-segmentation.

* Accepted at IROS 2020. The first two authors contributed equally and are listed in alphabetical order

Via

Access Paper or Ask Questions

FusionMapping: Learning Depth Prediction with Monocular Images and 2D Laser Scans

Nov 29, 2019

Peng Yin, Jianing Qian, Yibo Cao, David Held, Howie Choset

Figure 1 for FusionMapping: Learning Depth Prediction with Monocular Images and 2D Laser Scans

Figure 2 for FusionMapping: Learning Depth Prediction with Monocular Images and 2D Laser Scans

Figure 3 for FusionMapping: Learning Depth Prediction with Monocular Images and 2D Laser Scans

Figure 4 for FusionMapping: Learning Depth Prediction with Monocular Images and 2D Laser Scans

Abstract:Acquiring accurate three-dimensional depth information conventionally requires expensive multibeam LiDAR devices. Recently, researchers have developed a less expensive option by predicting depth information from two-dimensional color imagery. However, there still exists a substantial gap in accuracy between depth information estimated from two-dimensional images and real LiDAR point-cloud. In this paper, we introduce a fusion-based depth prediction method, called FusionMapping. This is the first method that fuses colored imagery and two-dimensional laser scan to estimate depth in-formation. More specifically, we propose an autoencoder-based depth prediction network and a novel point-cloud refinement network for depth estimation. We analyze the performance of our FusionMapping approach on the KITTI LiDAR odometry dataset and an indoor mobile robot system. The results show that our introduced approach estimates depth with better accuracy when compared to existing methods.

Via

Access Paper or Ask Questions

Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks

Oct 21, 2019

Yihui He, Jianing Qian, Jianren Wang

Figure 1 for Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks

Figure 2 for Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks

Figure 3 for Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks

Figure 4 for Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks

Abstract:Very deep convolutional neural networks (CNNs) have been firmly established as the primary methods for many computer vision tasks. However, most state-of-the-art CNNs are large, which results in high inference latency. Recently, depth-wise separable convolution has been proposed for image recognition tasks on computationally limited platforms such as robotics and self-driving cars. Though it is much faster than its counterpart, regular convolution, accuracy is sacrificed. In this paper, we propose a novel decomposition approach based on SVD, namely depth-wise decomposition, for expanding regular convolutions into depthwise separable convolutions while maintaining high accuracy. We show our approach can be further generalized to the multi-channel and multi-layer cases, based on Generalized Singular Value Decomposition (GSVD) [59]. We conduct thorough experiments with the latest ShuffleNet V2 model [47] on both random synthesized dataset and a large-scale image recognition dataset: ImageNet [10]. Our approach outperforms channel decomposition [73] on all datasets. More importantly, our approach improves the Top-1 accuracy of ShuffleNet V2 by ~2%.

* CVPR 2019 workshop, Efficient Deep Learning for Computer Vision

Via

Access Paper or Ask Questions

A surgical system for automatic registration, stiffness mapping and dynamic image overlay

Nov 23, 2017

Nicolas Zevallos, Rangaprasad Arun Srivatsan, Hadi Salman, Lu Li, Jianing Qian, Saumya Saxena, Mengyun Xu, Kartik Patath, Howie Choset

Figure 1 for A surgical system for automatic registration, stiffness mapping and dynamic image overlay

Figure 2 for A surgical system for automatic registration, stiffness mapping and dynamic image overlay

Figure 3 for A surgical system for automatic registration, stiffness mapping and dynamic image overlay

Figure 4 for A surgical system for automatic registration, stiffness mapping and dynamic image overlay

Abstract:In this paper we develop a surgical system using the da Vinci research kit (dVRK) that is capable of autonomously searching for tumors and dynamically displaying the tumor location using augmented reality. Such a system has the potential to quickly reveal the location and shape of tumors and visually overlay that information to reduce the cognitive overload of the surgeon. We believe that our approach is one of the first to incorporate state-of-the-art methods in registration, force sensing and tumor localization into a unified surgical system. First, the preoperative model is registered to the intra-operative scene using a Bingham distribution-based filtering approach. An active level set estimation is then used to find the location and the shape of the tumors. We use a recently developed miniature force sensor to perform the palpation. The estimated stiffness map is then dynamically overlaid onto the registered preoperative model of the organ. We demonstrate the efficacy of our system by performing experiments on phantom prostate models with embedded stiff inclusions.

* International Symposium on Medical Robotics (ISMR 2018)

Via

Access Paper or Ask Questions