Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weilong Yang

Three-dimensional attention Transformer for state evaluation in real-time strategy games

Jan 07, 2025

Yanqing Ye, Weilong Yang, Kai Qiu, Jie Zhang

Figure 1 for Three-dimensional attention Transformer for state evaluation in real-time strategy games

Figure 2 for Three-dimensional attention Transformer for state evaluation in real-time strategy games

Figure 3 for Three-dimensional attention Transformer for state evaluation in real-time strategy games

Figure 4 for Three-dimensional attention Transformer for state evaluation in real-time strategy games

Abstract:Situation assessment in Real-Time Strategy (RTS) games is crucial for understanding decision-making in complex adversarial environments. However, existing methods remain limited in processing multi-dimensional feature information and temporal dependencies. Here we propose a tri-dimensional Space-Time-Feature Transformer (TSTF Transformer) architecture, which efficiently models battlefield situations through three independent but cascaded modules: spatial attention, temporal attention, and feature attention. On a dataset comprising 3,150 adversarial experiments, the 8-layer TSTF Transformer demonstrates superior performance: achieving 58.7% accuracy in the early game (~4% progress), significantly outperforming the conventional Timesformer's 41.8%; reaching 97.6% accuracy in the mid-game (~40% progress) while maintaining low performance variation (standard deviation 0.114). Meanwhile, this architecture requires fewer parameters (4.75M) compared to the baseline model (5.54M). Our study not only provides new insights into situation assessment in RTS games but also presents an innovative paradigm for Transformer-based multi-dimensional temporal modeling.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions

Online Reinforcement Learning-Based Dynamic Adaptive Evaluation Function for Real-Time Strategy Tasks

Jan 07, 2025

Weilong Yang, Jie Zhang, Xunyun Liu, Yanqing Ye

Abstract:Effective evaluation of real-time strategy tasks requires adaptive mechanisms to cope with dynamic and unpredictable environments. This study proposes a method to improve evaluation functions for real-time responsiveness to battle-field situation changes, utilizing an online reinforcement learning-based dynam-ic weight adjustment mechanism within the real-time strategy game. Building on traditional static evaluation functions, the method employs gradient descent in online reinforcement learning to update weights dynamically, incorporating weight decay techniques to ensure stability. Additionally, the AdamW optimizer is integrated to adjust the learning rate and decay rate of online reinforcement learning in real time, further reducing the dependency on manual parameter tun-ing. Round-robin competition experiments demonstrate that this method signifi-cantly enhances the application effectiveness of the Lanchester combat model evaluation function, Simple evaluation function, and Simple Sqrt evaluation function in planning algorithms including IDABCD, IDRTMinimax, and Port-folio AI. The method achieves a notable improvement in scores, with the en-hancement becoming more pronounced as the map size increases. Furthermore, the increase in evaluation function computation time induced by this method is kept below 6% for all evaluation functions and planning algorithms. The pro-posed dynamic adaptive evaluation function demonstrates a promising approach for real-time strategy task evaluation.

* 22 pages, 9 figures

Via

Access Paper or Ask Questions

STT: Stateful Tracking with Transformers for Autonomous Driving

Apr 30, 2024

Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng(+12 more)

Abstract:Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTPS that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset.

* ICRA 2024

Via

Access Paper or Ask Questions

Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints

Jun 01, 2023

Jiachen Li, Xinwei Shi, Feiyu Chen, Jonathan Stroud, Zhishuai Zhang, Tian Lan, Junhua Mao, Jeonhyung Kang, Khaled S. Refaat, Weilong Yang(+2 more)

Figure 1 for Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints

Figure 2 for Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints

Figure 3 for Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints

Figure 4 for Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints

Abstract:Accurate understanding and prediction of human behaviors are critical prerequisites for autonomous vehicles, especially in highly dynamic and interactive scenarios such as intersections in dense urban areas. In this work, we aim at identifying crossing pedestrians and predicting their future trajectories. To achieve these goals, we not only need the context information of road geometry and other traffic participants but also need fine-grained information of the human pose, motion and activity, which can be inferred from human keypoints. In this paper, we propose a novel multi-task learning framework for pedestrian crossing action recognition and trajectory prediction, which utilizes 3D human keypoints extracted from raw sensor data to capture rich information on human pose and activity. Moreover, we propose to apply two auxiliary tasks and contrastive learning to enable auxiliary supervisions to improve the learned keypoints representation, which further enhances the performance of major tasks. We validate our approach on a large-scale in-house dataset, as well as a public benchmark dataset, and show that our approach achieves state-of-the-art performance on a wide range of evaluation metrics. The effectiveness of each model component is validated in a detailed ablation study.

* ICRA 2023

Via

Access Paper or Ask Questions

Automatic Non-Linear Video Editing Transfer

May 14, 2021

Nathan Frey, Peggy Chi, Weilong Yang, Irfan Essa

Figure 1 for Automatic Non-Linear Video Editing Transfer

Figure 2 for Automatic Non-Linear Video Editing Transfer

Figure 3 for Automatic Non-Linear Video Editing Transfer

Figure 4 for Automatic Non-Linear Video Editing Transfer

Abstract:We propose an automatic approach that extracts editing styles in a source video and applies the edits to matched footage for video creation. Our Computer Vision based techniques considers framing, content type, playback speed, and lighting of each input video segment. By applying a combination of these features, we demonstrate an effective method that automatically transfers the visual and temporal styles from professionally edited videos to unseen raw footage. We evaluated our approach with real-world videos that contained a total of 3872 video shots of a variety of editing styles, including different subjects, camera motions, and lighting. We reported feedback from survey participants who reviewed a set of our results.

* AI for Content Creation Workshop at CVPR 2021
* Published to AI for Content Creation Workshop at CVPR 2021

Via

Access Paper or Ask Questions

Regularizing Generative Adversarial Networks under Limited Data

Apr 07, 2021

Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, Weilong Yang

Figure 1 for Regularizing Generative Adversarial Networks under Limited Data

Figure 2 for Regularizing Generative Adversarial Networks under Limited Data

Figure 3 for Regularizing Generative Adversarial Networks under Limited Data

Figure 4 for Regularizing Generative Adversarial Networks under Limited Data

Abstract:Recent years have witnessed the rapid progress of generative adversarial networks (GANs). However, the success of the GAN models hinges on a large amount of training data. This work proposes a regularization approach for training robust GAN models on limited data. We theoretically show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data. Extensive experiments on several benchmark datasets demonstrate that the proposed regularization scheme 1) improves the generalization performance and stabilizes the learning dynamics of GAN models under limited training data, and 2) complements the recent data augmentation methods. These properties facilitate training GAN models to achieve state-of-the-art performance when only limited training data of the ImageNet benchmark is available.

* CVPR 2021. Project Page: https://hytseng0509.github.io/lecam-gan Code: https://github.com/google/lecam-gan

Via

Access Paper or Ask Questions

Text as Neural Operator: Image Manipulation by Text Instruction

Aug 12, 2020

Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Honglak Lee, Irfan Essa, Weilong Yang

Figure 1 for Text as Neural Operator: Image Manipulation by Text Instruction

Figure 2 for Text as Neural Operator: Image Manipulation by Text Instruction

Figure 3 for Text as Neural Operator: Image Manipulation by Text Instruction

Figure 4 for Text as Neural Operator: Image Manipulation by Text Instruction

Abstract:In this paper, we study a new task that allows users to edit an input image using language instructions. In this image generation task, the inputs are a reference image and a text instruction that describes desired modifications to the input image. We propose a GAN-based method to tackle this problem. The key idea is to treat language as neural operators to locally modify the image feature. To this end, our model decomposes the generation process into finding where (spatial region) and how (text operators) to apply modifications. We show that the proposed model performs favorably against recent baselines on three datasets.

Via

Access Paper or Ask Questions

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

Jul 16, 2020

Hung-Yu Tseng, Hsin-Ying Lee, Lu Jiang, Ming-Hsuan Yang, Weilong Yang

Figure 1 for RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

Figure 2 for RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

Figure 3 for RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

Figure 4 for RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

Abstract:Image generation from scene description is a cornerstone technique for the controlled generation, which is beneficial to applications such as content creation and image editing. In this work, we aim to synthesize images from scene description with retrieved patches as reference. We propose a differentiable retrieval module. With the differentiable retrieval module, we can (1) make the entire pipeline end-to-end trainable, enabling the learning of better feature embedding for retrieval; (2) encourage the selection of mutually compatible patches with additional objective functions. We conduct extensive quantitative and qualitative experiments to demonstrate that the proposed method can generate realistic and diverse images, where the retrieved patches are reasonable and mutually compatible.

* ECCV 2020

Via

Access Paper or Ask Questions

Neural Design Network: Graphic Layout Generation with Constraints

Dec 19, 2019

Hsin-Ying Lee, Weilong Yang, Lu Jiang, Madison Le, Irfan Essa, Haifeng Gong, Ming-Hsuan Yang

Figure 1 for Neural Design Network: Graphic Layout Generation with Constraints

Figure 2 for Neural Design Network: Graphic Layout Generation with Constraints

Figure 3 for Neural Design Network: Graphic Layout Generation with Constraints

Figure 4 for Neural Design Network: Graphic Layout Generation with Constraints

Abstract:Graphic design is essential for visual communication with layouts being fundamental to composing attractive designs. Layout generation differs from pixel-level image synthesis and is unique in terms of the requirement of mutual relations among the desired components. We propose a method for design layout generation that can satisfy user-specified constraints. The proposed neural design network (NDN) consists of three modules. The first module predicts a graph with complete relations from a graph with user-specified relations. The second module generates a layout from the predicted graph. Finally, the third module fine-tunes the predicted layout. Quantitative and qualitative experiments demonstrate that the generated layouts are visually similar to real design layouts. We also construct real designs based on predicted layouts for a better understanding of the visual quality. Finally, we demonstrate a practical application on layout recommendation.

Via

Access Paper or Ask Questions

Synthetic vs Real: Deep Learning on Controlled Noise

Nov 21, 2019

Lu Jiang, Di Huang, Weilong Yang

Figure 1 for Synthetic vs Real: Deep Learning on Controlled Noise

Figure 2 for Synthetic vs Real: Deep Learning on Controlled Noise

Figure 3 for Synthetic vs Real: Deep Learning on Controlled Noise

Figure 4 for Synthetic vs Real: Deep Learning on Controlled Noise

Abstract:Performing controlled experiments on noisy data is essential in thoroughly understanding deep learning across a spectrum of noise levels. Due to the lack of suitable datasets, previous research have only examined deep learning on controlled synthetic noise, and real-world noise has never been systematically studied in a controlled setting. To this end, this paper establishes a benchmark of real-world noisy labels at 10 controlled noise levels. As real-world noise possesses unique properties, to understand the difference, we conduct a large-scale study across a variety of noise levels and types, architectures, methods, and training settings. Our study shows that: (1) Deep Neural Networks (DNNs) generalize much better on real-world noise. (2) DNNs may not learn patterns first on real-world noisy data. (3) When networks are fine-tuned, ImageNet architectures generalize well on noisy data. (4) Real-world noise appears to be less harmful, yet it is more difficult for robust DNN methods to improve. (5) Robust learning methods that work well on synthetic noise may not work as well on real-world noise, and vice versa. We hope our benchmark, as well as our findings, will facilitate deep learning research on noisy data.

Via

Access Paper or Ask Questions