Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bingzhang Hu

Knowing the Past to Predict the Future: Reinforcement Virtual Learning

Nov 02, 2022

Peng Zhang, Yawen Huang, Bingzhang Hu, Shizheng Wang, Haoran Duan, Noura Al Moubayed, Yefeng Zheng, Yang Long

Abstract:Reinforcement Learning (RL)-based control system has received considerable attention in recent decades. However, in many real-world problems, such as Batch Process Control, the environment is uncertain, which requires expensive interaction to acquire the state and reward values. In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space using the predictive models with only historical data. The proposed framework enables a step-by-step RL model to predict the future state and select optimal actions for long-sight decisions. The main focuses are summarized as: 1) how to balance the long-sight and short-sight rewards with an optimal strategy; 2) how to make the virtual model interacting with real environment to converge to a final learning policy. Under the experimental settings of Fed-Batch Process, our method consistently outperforms the existing state-of-the-art methods.

Via

Access Paper or Ask Questions

Discriminative Latent Semantic Graph for Video Captioning

Aug 10, 2021

Yang Bai, Junyan Wang, Yang Long, Bingzhang Hu, Yang Song, Maurice Pagnucco, Yu Guan

Figure 1 for Discriminative Latent Semantic Graph for Video Captioning

Figure 2 for Discriminative Latent Semantic Graph for Video Captioning

Figure 3 for Discriminative Latent Semantic Graph for Video Captioning

Figure 4 for Discriminative Latent Semantic Graph for Video Captioning

Abstract:Video captioning aims to automatically generate natural language sentences that can describe the visual contents of a given video. Existing generative models like encoder-decoder frameworks cannot explicitly explore the object-level interactions and frame-level information from complex spatio-temporal data to generate semantic-rich captions. Our main contribution is to identify three key problems in a joint framework for future video summarization tasks. 1) Enhanced Object Proposal: we propose a novel Conditional Graph that can fuse spatio-temporal information into latent object proposal. 2) Visual Knowledge: Latent Proposal Aggregation is proposed to dynamically extract visual words with higher semantic levels. 3) Sentence Validation: A novel Discriminative Language Validator is proposed to verify generated captions so that key semantic concepts can be effectively preserved. Our experiments on two public datasets (MVSD and MSR-VTT) manifest significant improvements over state-of-the-art approaches on all metrics, especially for BLEU-4 and CIDEr. Our code is available at https://github.com/baiyang4/D-LSG-Video-Caption.

* accepted by ACM MM 2021

Via

Access Paper or Ask Questions

Query Twice: Dual Mixture Attention Meta Learning for Video Summarization

Aug 19, 2020

Junyan Wang, Yang Bai, Yang Long, Bingzhang Hu, Zhenhua Chai, Yu Guan, Xiaolin Wei

Figure 1 for Query Twice: Dual Mixture Attention Meta Learning for Video Summarization

Figure 2 for Query Twice: Dual Mixture Attention Meta Learning for Video Summarization

Figure 3 for Query Twice: Dual Mixture Attention Meta Learning for Video Summarization

Figure 4 for Query Twice: Dual Mixture Attention Meta Learning for Video Summarization

Abstract:Video summarization aims to select representative frames to retain high-level information, which is usually solved by predicting the segment-wise importance score via a softmax function. However, softmax function suffers in retaining high-rank representations for complex visual or sequential information, which is known as the Softmax Bottleneck problem. In this paper, we propose a novel framework named Dual Mixture Attention (DMASum) model with Meta Learning for video summarization that tackles the softmax bottleneck problem, where the Mixture of Attention layer (MoA) effectively increases the model capacity by employing twice self-query attention that can capture the second-order changes in addition to the initial query-key attention, and a novel Single Frame Meta Learning rule is then introduced to achieve more generalization to small datasets with limited training sources. Furthermore, the DMASum significantly exploits both visual and sequential attention that connects local key-frame and global attention in an accumulative way. We adopt the new evaluation protocol on two public datasets, SumMe, and TVSum. Both qualitative and quantitative experiments manifest significant improvements over the state-of-the-art methods.

* This manuscript has been accepted at ACM MM 2020

Via

Access Paper or Ask Questions

Dual-reference Age Synthesis

Aug 07, 2019

Yuan Zhou, Bingzhang Hu, Ling Shao

Figure 1 for Dual-reference Age Synthesis

Figure 2 for Dual-reference Age Synthesis

Figure 3 for Dual-reference Age Synthesis

Figure 4 for Dual-reference Age Synthesis

Abstract:Age synthesis has received much attention in recent years. State-of-the-art methods typically take an input image and utilize a numeral to control the age of the generated image. In this paper, we revisit the age synthesis and ask: is a numeral capable enough to describe the human age? We propose a new framework Dual-reference Age Synthesis (DRAS) that takes two images as inputs to generate an image which shares the same personality of the first image and has the similar age with the second image. In the proposed framework, we employ a joint manifold feature which consists of disentangled age and identity information. The final images are generated by training a generative adversarial network which competes against an age agent and an identity agent. Experimental results demonstrate the appealing performance and flexibility of the proposed framework by comparing with the state-of-the-art and ground truth.

Via

Access Paper or Ask Questions

Order Matters: Shuffling Sequence Generation for Video Prediction

Jul 20, 2019

Junyan Wang, Bingzhang Hu, Yang Long, Yu Guan

Figure 1 for Order Matters: Shuffling Sequence Generation for Video Prediction

Figure 2 for Order Matters: Shuffling Sequence Generation for Video Prediction

Figure 3 for Order Matters: Shuffling Sequence Generation for Video Prediction

Figure 4 for Order Matters: Shuffling Sequence Generation for Video Prediction

Abstract:Predicting future frames in natural video sequences is a new challenge that is receiving increasing attention in the computer vision community. However, existing models suffer from severe loss of temporal information when the predicted sequence is long. Compared to previous methods focusing on generating more realistic contents, this paper extensively studies the importance of sequential order information for video generation. A novel Shuffling sEquence gEneration network (SEE-Net) is proposed that can learn to discriminate unnatural sequential orders by shuffling the video frames and comparing them to the real video sequence. Systematic experiments on three datasets with both synthetic and real-world videos manifest the effectiveness of shuffling sequence generation for video prediction in our proposed model and demonstrate state-of-the-art performance by both qualitative and quantitative evaluations. The source code is available at https://github.com/andrewjywang/SEENet.

* This manuscript has been accepted at BMVC 2019. See the project at https://github.com/andrewjywang/SEENet

Via

Access Paper or Ask Questions