Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liming Zheng

DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data

Mar 25, 2025

Liming Zheng, Feng Yan, Fanfan Liu, Chengjian Feng, Yufeng Zhong, Yiyang Huang, Lin Ma

Abstract:The growing adoption of Vision-Language-Action (VLA) models in embodied AI intensifies the demand for diverse manipulation demonstrations. However, high costs associated with data collection often result in insufficient data coverage across all scenarios, which limits the performance of the models. It is observed that the spatial reasoning phase (SRP) in large workspace dominates the failure cases. Fortunately, this data can be collected with low cost, underscoring the potential of leveraging inexpensive data to improve model performance. In this paper, we introduce the DataPlatter method, a framework that decouples training trajectories into distinct task stages and leverages abundant easily collectible SRP data to enhance VLA model's generalization. Through analysis we demonstrate that sub-task-specific training with additional SRP data with proper proportion can act as a performance catalyst for robot manipulation, maximizing the utilization of costly physical interaction phase (PIP) data. Experiments show that through introducing large proportion of cost-effective SRP trajectories into a limited set of PIP data, we can achieve a maximum improvement of 41\% on success rate in zero-shot scenes, while with the ability to transfer manipulation skill to novel targets.

Via

Access Paper or Ask Questions

RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation

Dec 10, 2024

Feng Yan, Fanfan Liu, Liming Zheng, Yufeng Zhong, Yiyang Huang, Zechao Guan, Chengjian Feng, Lin Ma

Abstract:In recent years, robotics has advanced significantly through the integration of larger models and large-scale datasets. However, challenges remain in applying these models to 3D spatial interactions and managing data collection costs. To address these issues, we propose the multimodal robotic manipulation model, RoboMM, along with the comprehensive dataset, RoboData. RoboMM enhances 3D perception through camera parameters and occupancy supervision. Building on OpenFlamingo, it incorporates Modality-Isolation-Mask and multimodal decoder blocks, improving modality fusion and fine-grained perception. RoboData offers the complete evaluation system by integrating several well-known datasets, achieving the first fusion of multi-view images, camera parameters, depth maps, and actions, and the space alignment facilitates comprehensive learning from diverse robotic datasets. Equipped with RoboData and the unified physical space, RoboMM is the generalist policy that enables simultaneous evaluation across all tasks within multiple datasets, rather than focusing on limited selection of data or tasks. Its design significantly enhances robotic manipulation performance, increasing the average sequence length on the CALVIN from 1.7 to 3.3 and ensuring cross-embodiment capabilities, achieving state-of-the-art results across multiple datasets.

Via

Access Paper or Ask Questions

RoboCAS: A Benchmark for Robotic Manipulation in Complex Object Arrangement Scenarios

Jul 09, 2024

Liming Zheng, Feng Yan, Fanfan Liu, Chengjian Feng, Zhuoliang Kang, Lin Ma

Figure 1 for RoboCAS: A Benchmark for Robotic Manipulation in Complex Object Arrangement Scenarios

Figure 2 for RoboCAS: A Benchmark for Robotic Manipulation in Complex Object Arrangement Scenarios

Figure 3 for RoboCAS: A Benchmark for Robotic Manipulation in Complex Object Arrangement Scenarios

Figure 4 for RoboCAS: A Benchmark for Robotic Manipulation in Complex Object Arrangement Scenarios

Abstract:Foundation models hold significant potential for enabling robots to perform long-horizon general manipulation tasks. However, the simplicity of tasks and the uniformity of environments in existing benchmarks restrict their effective deployment in complex scenarios. To address this limitation, this paper introduces the \textit{RoboCAS} benchmark, the first benchmark specifically designed for complex object arrangement scenarios in robotic manipulation. This benchmark employs flexible and concise scripted policies to efficiently collect a diverse array of demonstrations, showcasing scattered, orderly, and stacked object arrangements within a highly realistic physical simulation environment. It includes complex processes such as target retrieval, obstacle clearance, and robot manipulation, testing agents' abilities to perform long-horizon planning for spatial reasoning and predicting chain reactions under ambiguous instructions. Extensive experiments on multiple baseline models reveal their limitations in managing complex object arrangement scenarios, underscoring the urgent need for intelligent agents capable of performing long-horizon operations in practical deployments and providing valuable insights for future research directions. Project website: \url{https://github.com/notFoundThisPerson/RoboCAS-v0}.

Via

Access Paper or Ask Questions

RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton

Jun 27, 2024

Fanfan Liu, Feng Yan, Liming Zheng, Chengjian Feng, Yiyang Huang, Lin Ma

Abstract:Utilizing Vision-Language Models (VLMs) for robotic manipulation represents a novel paradigm, aiming to enhance the model's ability to generalize to new objects and instructions. However, due to variations in camera specifications and mounting positions, existing methods exhibit significant performance disparities across different robotic platforms. To address this challenge, we propose RoboUniView in this paper, an innovative approach that decouples visual feature extraction from action learning. We first learn a unified view representation from multi-perspective views by pre-training on readily accessible data, and then derive actions from this unified view representation to control robotic manipulation. This unified view representation more accurately mirrors the physical world and is not constrained by the robotic platform's camera parameters. Thanks to this methodology, we achieve state-of-the-art performance on the demanding CALVIN benchmark, enhancing the success rate in the $D \to D$ setting from 88.7% to 96.2%, and in the $ABC \to D$ setting from 82.4% to 94.2%. Moreover, our model exhibits outstanding adaptability and flexibility: it maintains high performance under unseen camera parameters, can utilize multiple datasets with varying camera parameters, and is capable of joint cross-task learning across datasets. Code is provided for re-implementation. https://github.com/liufanfanlff/RoboUniview

Via

Access Paper or Ask Questions

Aerial Imagery Pile burn detection using Deep Learning: the FLAME dataset

Dec 28, 2020

Alireza Shamsoshoara, Fatemeh Afghah, Abolfazl Razi, Liming Zheng, Peter Z Fulé, Erik Blasch

Figure 1 for Aerial Imagery Pile burn detection using Deep Learning: the FLAME dataset

Figure 2 for Aerial Imagery Pile burn detection using Deep Learning: the FLAME dataset

Figure 3 for Aerial Imagery Pile burn detection using Deep Learning: the FLAME dataset

Figure 4 for Aerial Imagery Pile burn detection using Deep Learning: the FLAME dataset

Abstract:Wildfires are one of the costliest and deadliest natural disasters in the US, causing damage to millions of hectares of forest resources and threatening the lives of people and animals. Of particular importance are risks to firefighters and operational forces, which highlights the need for leveraging technology to minimize danger to people and property. FLAME (Fire Luminosity Airborne-based Machine learning Evaluation) offers a dataset of aerial images of fires along with methods for fire detection and segmentation which can help firefighters and researchers to develop optimal fire management strategies. This paper provides a fire image dataset collected by drones during a prescribed burning piled detritus in an Arizona pine forest. The dataset includes video recordings and thermal heatmaps captured by infrared cameras. The captured videos and images are annotated and labeled frame-wise to help researchers easily apply their fire detection and modeling algorithms. The paper also highlights solutions to two machine learning problems: (1) Binary classification of video frames based on the presence [and absence] of fire flames. An Artificial Neural Network (ANN) method is developed that achieved a 76% classification accuracy. (2) Fire detection using segmentation methods to precisely determine fire borders. A deep learning method is designed based on the U-Net up-sampling and down-sampling approach to extract a fire mask from the video frames. Our FLAME method approached a precision of 92% and a recall of 84%. Future research will expand the technique for free burning broadcast fire using thermal images.

* 27 Pages, 7 Figures, 4 Tables

Via

Access Paper or Ask Questions

A novel control mode of bionic morphing tail based on deep reinforcement learning

Oct 08, 2020

Liming Zheng, Zhou Zhou, Pengbo Sun, Zhilin Zhang, Rui Wang

Figure 1 for A novel control mode of bionic morphing tail based on deep reinforcement learning

Figure 2 for A novel control mode of bionic morphing tail based on deep reinforcement learning

Figure 3 for A novel control mode of bionic morphing tail based on deep reinforcement learning

Figure 4 for A novel control mode of bionic morphing tail based on deep reinforcement learning

Abstract:In the field of fixed wing aircraft, many morphing technologies have been applied to the wing, such as adaptive airfoil, variable span aircraft, variable swept angle aircraft, etc., but few are aimed at the tail. The traditional fixed wing tail includes horizontal and vertical tail. Inspired by the bird tail, this paper will introduce a new bionic tail. The tail has a novel control mode, which has multiple control variables. Compared with the traditional fixed wing tail, it adds the area control and rotation control around the longitudinal symmetry axis, so it can control the pitch and yaw of the aircraft at the same time. When the area of the tail changes, the maneuverability and stability of the aircraft can be changed, and the aerodynamic efficiency of the aircraft can also be improved. The aircraft with morphing ability is often difficult to establish accurate mathematical model, because the model has a strong nonlinear, model-based control method is difficult to deal with the strong nonlinear aircraft. In recent years, with the rapid development of artificial intelligence technology, learning based control methods are also brilliant, in which the deep reinforcement learning algorithm can be a good solution to the control object which is difficult to establish model. In this paper, the model-free control algorithm PPO is used to control the tail, and the traditional PID is used to control the aileron and throttle. After training in simulation, the tail shows excellent attitude control ability.

* PDF, 8 pages with 10 figures, IEEE Robotics and Automation Letters and ICRA (under review)

Via

Access Paper or Ask Questions