Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weihang Yuan

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Sep 07, 2023

Xin Li, Wenqing Chu, Ye Wu, Weihang Yuan, Fanglong Liu, Qi Zhang, Fu Li, Haocheng Feng, Errui Ding, Jingdong Wang

Figure 1 for VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Figure 2 for VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Figure 3 for VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Figure 4 for VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Abstract:In this paper, we present VideoGen, a text-to-video generation approach, which can generate a high-definition video with high frame fidelity and strong temporal consistency using reference-guided latent diffusion. We leverage an off-the-shelf text-to-image generation model, e.g., Stable Diffusion, to generate an image with high content quality from the text prompt, as a reference image to guide video generation. Then, we introduce an efficient cascaded latent diffusion module conditioned on both the reference image and the text prompt, for generating latent video representations, followed by a flow-based temporal upsampling step to improve the temporal resolution. Finally, we map latent video representations into a high-definition video through an enhanced video decoder. During training, we use the first frame of a ground-truth video as the reference image for training the cascaded latent diffusion module. The main characterises of our approach include: the reference image generated by the text-to-image model improves the visual fidelity; using it as the condition makes the diffusion model focus more on learning the video dynamics; and the video decoder is trained over unlabeled video data, thus benefiting from high-quality easily-available videos. VideoGen sets a new state-of-the-art in text-to-video generation in terms of both qualitative and quantitative evaluation. See \url{https://videogen.github.io/VideoGen/} for more samples.

* 8pages, 8figures, project page: https://videogen.github.io/VideoGen/

Via

Access Paper or Ask Questions

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Aug 25, 2022

Ren Yang, Radu Timofte, Xin Li, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu li, He Zheng, Weihang Yuan(+42 more)

Figure 1 for AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Figure 2 for AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Figure 3 for AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Figure 4 for AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Abstract:This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR.

* Camera-ready version

Via

Access Paper or Ask Questions

Task Modifiers for HTN Planning and Acting

Feb 09, 2022

Weihang Yuan, Hector Munoz-Avila, Venkatsampath Raja Gogineni, Sravya Kondrakunta, Michael Cox, Lifang He

Figure 1 for Task Modifiers for HTN Planning and Acting

Figure 2 for Task Modifiers for HTN Planning and Acting

Figure 3 for Task Modifiers for HTN Planning and Acting

Abstract:The ability of an agent to change its objectives in response to unexpected events is desirable in dynamic environments. In order to provide this capability to hierarchical task network (HTN) planning, we propose an extension of the paradigm called task modifiers, which are functions that receive a task list and a state and produce a new task list. We focus on a particular type of problems in which planning and execution are interleaved and the ability to handle exogenous events is crucial. To determine the efficacy of this approach, we evaluate the performance of our task modifier implementation in two environments, one of which is a simulation that differs substantially from traditional HTN domains.

* Presented at The Ninth Advances in Cognitive Systems (ACS) Conference 2021 (arXiv:2201.06134)

Via

Access Paper or Ask Questions

Hierarchical Reinforcement Learning for Deep Goal Reasoning: An Expressiveness Analysis

Jun 21, 2020

Weihang Yuan, Héctor Muñoz-Avila

Figure 1 for Hierarchical Reinforcement Learning for Deep Goal Reasoning: An Expressiveness Analysis

Figure 2 for Hierarchical Reinforcement Learning for Deep Goal Reasoning: An Expressiveness Analysis

Figure 3 for Hierarchical Reinforcement Learning for Deep Goal Reasoning: An Expressiveness Analysis

Figure 4 for Hierarchical Reinforcement Learning for Deep Goal Reasoning: An Expressiveness Analysis

Abstract:Hierarchical DQN (h-DQN) is a two-level architecture of feedforward neural networks where the meta level selects goals and the lower level takes actions to achieve the goals. We show tasks that cannot be solved by h-DQN, exemplifying the limitation of this type of hierarchical framework (HF). We describe the recurrent hierarchical framework (RHF), generalizing architectures that use a recurrent neural network at the meta level. We analyze the expressiveness of HF and RHF using context-sensitive grammars. We show that RHF is more expressive than HF. We perform experiments comparing an implementation of RHF with two HF baselines; the results corroborate our theoretical findings.

Via

Access Paper or Ask Questions

A Layered Architecture for Active Perception: Image Classification using Deep Reinforcement Learning

Sep 20, 2019

Hossein K. Mousavi, Guangyi Liu, Weihang Yuan, Martin Takáč, Héctor Muñoz-Avila, Nader Motee

Figure 1 for A Layered Architecture for Active Perception: Image Classification using Deep Reinforcement Learning

Figure 2 for A Layered Architecture for Active Perception: Image Classification using Deep Reinforcement Learning

Figure 3 for A Layered Architecture for Active Perception: Image Classification using Deep Reinforcement Learning

Figure 4 for A Layered Architecture for Active Perception: Image Classification using Deep Reinforcement Learning

Abstract:We propose a planning and perception mechanism for a robot (agent), that can only observe the underlying environment partially, in order to solve an image classification problem. A three-layer architecture is suggested that consists of a meta-layer that decides the intermediate goals, an action-layer that selects local actions as the agent navigates towards a goal, and a classification-layer that evaluates the reward and makes a prediction. We design and implement these layers using deep reinforcement learning. A generalized policy gradient algorithm is utilized to learn the parameters of these layers to maximize the expected reward. Our proposed methodology is tested on the MNIST dataset of handwritten digits, which provides us with a level of explainability while interpreting the agent's intermediate goals and course of action.

* Submitted to ICRA-2020

Via

Access Paper or Ask Questions