Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ying Peng

GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation

Feb 13, 2025

Hongyin Zhang, Pengxiang Ding, Shangke Lyu, Ying Peng, Donglin Wang

Abstract:With the rapid development of embodied artificial intelligence, significant progress has been made in vision-language-action (VLA) models for general robot decision-making. However, the majority of existing VLAs fail to account for the inevitable external perturbations encountered during deployment. These perturbations introduce unforeseen state information to the VLA, resulting in inaccurate actions and consequently, a significant decline in generalization performance. The classic internal model control (IMC) principle demonstrates that a closed-loop system with an internal model that includes external input signals can accurately track the reference input and effectively offset the disturbance. We propose a novel closed-loop VLA method GEVRM that integrates the IMC principle to enhance the robustness of robot visual manipulation. The text-guided video generation model in GEVRM can generate highly expressive future visual planning goals. Simultaneously, we evaluate perturbations by simulating responses, which are called internal embeddings and optimized through prototype contrastive learning. This allows the model to implicitly infer and distinguish perturbations from the external environment. The proposed GEVRM achieves state-of-the-art performance on both standard and perturbed CALVIN benchmarks and shows significant improvements in realistic robot tasks.

* Published as a conference paper at ICLR 2025

Via

Access Paper or Ask Questions

Masked Vision-Language Transformers for Scene Text Recognition

Nov 09, 2022

Jie Wu, Ying Peng, Shengming Zhang, Weigang Qi, Jian Zhang

Figure 1 for Masked Vision-Language Transformers for Scene Text Recognition

Figure 2 for Masked Vision-Language Transformers for Scene Text Recognition

Figure 3 for Masked Vision-Language Transformers for Scene Text Recognition

Figure 4 for Masked Vision-Language Transformers for Scene Text Recognition

Abstract:Scene text recognition (STR) enables computers to recognize and read the text in various real-world scenes. Recent STR models benefit from taking linguistic information in addition to visual cues into consideration. We propose a novel Masked Vision-Language Transformers (MVLT) to capture both the explicit and the implicit linguistic information. Our encoder is a Vision Transformer, and our decoder is a multi-modal Transformer. MVLT is trained in two stages: in the first stage, we design a STR-tailored pretraining method based on a masking strategy; in the second stage, we fine-tune our model and adopt an iterative correction method to improve the performance. MVLT attains superior results compared to state-of-the-art STR models on several benchmarks. Our code and model are available at https://github.com/onealwj/MVLT.

* The paper is accepted by the 33rd British Machine Vision Conference (BMVC 2022)

Via

Access Paper or Ask Questions

Fetal Brain Tissue Annotation and Segmentation Challenge Results

Apr 20, 2022

Kelly Payette, Hongwei Li, Priscille de Dumast, Roxane Licandro, Hui Ji, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Hao Liu, Yuchen Pei(+48 more)

Figure 1 for Fetal Brain Tissue Annotation and Segmentation Challenge Results

Figure 2 for Fetal Brain Tissue Annotation and Segmentation Challenge Results

Figure 3 for Fetal Brain Tissue Annotation and Segmentation Challenge Results

Figure 4 for Fetal Brain Tissue Annotation and Segmentation Challenge Results

Abstract:In-utero fetal MRI is emerging as an important tool in the diagnosis and analysis of the developing human brain. Automatic segmentation of the developing fetal brain is a vital step in the quantitative analysis of prenatal neurodevelopment both in the research and clinical context. However, manual segmentation of cerebral structures is time-consuming and prone to error and inter-observer variability. Therefore, we organized the Fetal Tissue Annotation (FeTA) Challenge in 2021 in order to encourage the development of automatic segmentation algorithms on an international level. The challenge utilized FeTA Dataset, an open dataset of fetal brain MRI reconstructions segmented into seven different tissues (external cerebrospinal fluid, grey matter, white matter, ventricles, cerebellum, brainstem, deep grey matter). 20 international teams participated in this challenge, submitting a total of 21 algorithms for evaluation. In this paper, we provide a detailed analysis of the results from both a technical and clinical perspective. All participants relied on deep learning methods, mainly U-Nets, with some variability present in the network architecture, optimization, and image pre- and post-processing. The majority of teams used existing medical imaging deep learning frameworks. The main differences between the submissions were the fine tuning done during training, and the specific pre- and post-processing steps performed. The challenge results showed that almost all submissions performed similarly. Four of the top five teams used ensemble learning methods. However, one team's algorithm performed significantly superior to the other submissions, and consisted of an asymmetrical U-Net network architecture. This paper provides a first of its kind benchmark for future automatic multi-tissue segmentation algorithms for the developing human brain in utero.

* Results from FeTA Challenge 2021, held at MICCAI; Manuscript submitted

Via

Access Paper or Ask Questions

A deep learning method for solving stochastic optimal control problems driven by fully-coupled FBSDEs

Apr 12, 2022

Shaolin Ji, Shige Peng, Ying Peng, Xichuan Zhang

Figure 1 for A deep learning method for solving stochastic optimal control problems driven by fully-coupled FBSDEs

Figure 2 for A deep learning method for solving stochastic optimal control problems driven by fully-coupled FBSDEs

Figure 3 for A deep learning method for solving stochastic optimal control problems driven by fully-coupled FBSDEs

Figure 4 for A deep learning method for solving stochastic optimal control problems driven by fully-coupled FBSDEs

Abstract:In this paper, we mainly focus on the numerical solution of high-dimensional stochastic optimal control problem driven by fully-coupled forward-backward stochastic differential equations (FBSDEs in short) through deep learning. We first transform the problem into a stochastic Stackelberg differential game(leader-follower problem), then a cross-optimization method (CO method) is developed where the leader's cost functional and the follower's cost functional are optimized alternatively via deep neural networks. As for the numerical results, we compute two examples of the investment-consumption problem solved through stochastic recursive utility models, and the results of both examples demonstrate the effectiveness of our proposed algorithm.

Via

Access Paper or Ask Questions

A control method for solving high-dimensional Hamiltonian systems through deep neural networks

Nov 04, 2021

Shaolin Ji, Shige Peng, Ying Peng, Xichuan Zhang

Figure 1 for A control method for solving high-dimensional Hamiltonian systems through deep neural networks

Figure 2 for A control method for solving high-dimensional Hamiltonian systems through deep neural networks

Figure 3 for A control method for solving high-dimensional Hamiltonian systems through deep neural networks

Figure 4 for A control method for solving high-dimensional Hamiltonian systems through deep neural networks

Abstract:In this paper, we mainly focus on solving high-dimensional stochastic Hamiltonian systems with boundary condition, and propose a novel method from the view of the stochastic control. In order to obtain the approximated solution of the Hamiltonian system, we first introduce a corresponding stochastic optimal control problem such that the Hamiltonian system of control problem is exactly what we need to solve, then develop two different algorithms suitable for different cases of the control problem and approximate the stochastic control via deep neural networks. From the numerical results, comparing with the Deep FBSDE method which was developed previously from the view of solving FBSDEs, the novel algorithms converge faster, which means that they require fewer training steps, and demonstrate more stable convergences for different Hamiltonian systems.

Via

Access Paper or Ask Questions

Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition

Jun 12, 2021

Yihong Dong, Ying Peng, Muqiao Yang, Songtao Lu, Qingjiang Shi

Figure 1 for Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition

Figure 2 for Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition

Figure 3 for Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition

Figure 4 for Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition

Abstract:Deep neural networks have been shown as a class of useful tools for addressing signal recognition issues in recent years, especially for identifying the nonlinear feature structures of signals. However, this power of most deep learning techniques heavily relies on an abundant amount of training data, so the performance of classic neural nets decreases sharply when the number of training data samples is small or unseen data are presented in the testing phase. This calls for an advanced strategy, i.e., model-agnostic meta-learning (MAML), which is able to capture the invariant representation of the data samples or signals. In this paper, inspired by the special structure of the signal, i.e., real and imaginary parts consisted in practical time-series signals, we propose a Complex-valued Attentional MEta Learner (CAMEL) for the problem of few-shot signal recognition by leveraging attention and meta-learning in the complex domain. To the best of our knowledge, this is also the first complex-valued MAML that can find the first-order stationary points of general nonconvex problems with theoretical convergence guarantees. Extensive experiments results showcase the superiority of the proposed CAMEL compared with the state-of-the-art methods.

Via

Access Paper or Ask Questions

Deep learning method for solving stochastic optimal control problem via stochastic maximum principle

Jul 05, 2020

Shaolin Ji, Shige Peng, Ying Peng, Xichuan Zhang

Figure 1 for Deep learning method for solving stochastic optimal control problem via stochastic maximum principle

Figure 2 for Deep learning method for solving stochastic optimal control problem via stochastic maximum principle

Figure 3 for Deep learning method for solving stochastic optimal control problem via stochastic maximum principle

Figure 4 for Deep learning method for solving stochastic optimal control problem via stochastic maximum principle

Abstract:In this paper, we aim to solve the stochastic optimal control problem via deep learning. Through the stochastic maximum principle and its corresponding Hamiltonian system, we propose a framework in which the original control problem is reformulated as a new one. This new stochastic optimal control problem has a quadratic loss function at the terminal time which provides an easier way to build a neural network structure. But the cost is that we must deal with an additional maximum condition. Some numerical examples such as the linear quadratic (LQ) stochastic optimal control problem and the calculation of G-expectation have been studied.

Via

Access Paper or Ask Questions

PMC-GANs: Generating Multi-Scale High-Quality Pedestrian with Multimodal Cascaded GANs

Dec 30, 2019

Jie Wu, Ying Peng, Chenghao Zheng, Zongbo Hao, Jian Zhang

Figure 1 for PMC-GANs: Generating Multi-Scale High-Quality Pedestrian with Multimodal Cascaded GANs

Figure 2 for PMC-GANs: Generating Multi-Scale High-Quality Pedestrian with Multimodal Cascaded GANs

Figure 3 for PMC-GANs: Generating Multi-Scale High-Quality Pedestrian with Multimodal Cascaded GANs

Figure 4 for PMC-GANs: Generating Multi-Scale High-Quality Pedestrian with Multimodal Cascaded GANs

Abstract:Recently, generative adversarial networks (GANs) have shown great advantages in synthesizing images, leading to a boost of explorations of using faked images to augment data. This paper proposes a multimodal cascaded generative adversarial networks (PMC-GANs) to generate realistic and diversified pedestrian images and augment pedestrian detection data. The generator of our model applies a residual U-net structure, with multi-scale residual blocks to encode features, and attention residual blocks to help decode and rebuild pedestrian images. The model constructs in a coarse-to-fine fashion and adopts cascade structure, which is beneficial to produce high-resolution pedestrians. PMC-GANs outperforms baselines, and when used for data augmentation, it improves pedestrian detection results.

* Accepted by The British Machine Vision Conference (BMVC2019)

Via

Access Paper or Ask Questions

Three algorithms for solving high-dimensional fully-coupled FBSDEs through deep learning

Jul 17, 2019

Shaolin Ji, Shige Peng, Ying Peng, Xichuan Zhang

Figure 1 for Three algorithms for solving high-dimensional fully-coupled FBSDEs through deep learning

Figure 2 for Three algorithms for solving high-dimensional fully-coupled FBSDEs through deep learning

Figure 3 for Three algorithms for solving high-dimensional fully-coupled FBSDEs through deep learning

Figure 4 for Three algorithms for solving high-dimensional fully-coupled FBSDEs through deep learning

Abstract:Recently, the deep learning method has been used for solving forward backward stochastic differential equations (FBSDEs) and parabolic partial differential equations (PDEs). It has good accuracy and performance for high-dimensional problems. In this paper, we mainly solve fully coupled FBSDEs through deep learning and provide three algorithms. Several numerical results show remarkable performance especially for high-dimensional cases.

* 25pages,6 figures

Via

Access Paper or Ask Questions