Abstract:Accurate and robust trajectory prediction of neighboring agents is critical for autonomous vehicles traversing in complex scenes. Most methods proposed in recent years are deep learning-based due to their strength in encoding complex interactions. However, unplausible predictions are often generated since they rely heavily on past observations and cannot effectively capture the transient and contingency interactions from sparse samples. In this paper, we propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction, to cope with the challenge of predicting motions shaped by multi-scale interactions. In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions at intermediate and global levels. In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage. To emulate the motion planning procedure so as to produce trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO) incorporated with a vehicle kinematics model is devised to plan motions under the dominant influence of microscopic interactions. A multi-objective reward is designed to balance between agent-centric accuracy and scene-wise compatibility. Experimental results show that our proposal matches the state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by the visualized results that the hierarchical learning framework captures the multi-scale interactions and improves the feasibility and compliance of the predicted trajectories.
Abstract:In recent years, multi-scale generative adversarial networks (GANs) have been proposed to build generalized image processing models based on single sample. Constraining on the sample size, multi-scale GANs have much difficulty converging to the global optimum, which ultimately leads to limitations in their capabilities. In this paper, we pioneered the introduction of PAC-Bayes generalized bound theory into the training analysis of specific models under different adversarial training methods, which can obtain a non-vacuous upper bound on the generalization error for the specified multi-scale GAN structure. Based on the drastic changes we found of the generalization error bound under different adversarial attacks and different training states, we proposed an adaptive training method which can greatly improve the image manipulation ability of multi-scale GANs. The final experimental results show that our adaptive training method in this paper has greatly contributed to the improvement of the quality of the images generated by multi-scale GANs on several image manipulation tasks. In particular, for the image super-resolution restoration task, the multi-scale GAN model trained by the proposed method achieves a 100% reduction in natural image quality evaluator (NIQE) and a 60% reduction in root mean squared error (RMSE), which is better than many models trained on large-scale datasets.
Abstract:Region Proposal Network (RPN) is the cornerstone of two-stage object detectors, it generates a sparse set of object proposals and alleviates the extrem foregroundbackground class imbalance problem during training. However, we find that the potential of the detector has not been fully exploited due to the IoU distribution imbalance and inadequate quantity of the training samples generated by RPN. With the increasing intersection over union (IoU), the exponentially smaller numbers of positive samples would lead to the distribution skewed towards lower IoUs, which hinders the optimization of detector at high IoU levels. In this paper, to break through the limitations of RPN, we propose IoU-Uniform R-CNN, a simple but effective method that directly generates training samples with uniform IoU distribution for the regression branch as well as the IoU prediction branch. Besides, we improve the performance of IoU prediction branch by eliminating the feature offsets of RoIs at inference, which helps the NMS procedure by preserving accurately localized bounding box. Extensive experiments on the PASCAL VOC and MS COCO dataset show the effectiveness of our method, as well as its compatibility and adaptivity to many object detection architectures. The code is made publicly available at https://github.com/zl1994/IoU-Uniform-R-CNN,