Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weihao Sun

AI Recommendation Systems for Lane-Changing Using Adherence-Aware Reinforcement Learning

Apr 28, 2025

Weihao Sun, Heeseung Bang, Andreas A. Malikopoulos

Figure 1 for AI Recommendation Systems for Lane-Changing Using Adherence-Aware Reinforcement Learning

Figure 2 for AI Recommendation Systems for Lane-Changing Using Adherence-Aware Reinforcement Learning

Figure 3 for AI Recommendation Systems for Lane-Changing Using Adherence-Aware Reinforcement Learning

Figure 4 for AI Recommendation Systems for Lane-Changing Using Adherence-Aware Reinforcement Learning

Abstract:In this paper, we present an adherence-aware reinforcement learning (RL) approach aimed at seeking optimal lane-changing recommendations within a semi-autonomous driving environment to enhance a single vehicle's travel efficiency. The problem is framed within a Markov decision process setting and is addressed through an adherence-aware deep Q network, which takes into account the partial compliance of human drivers with the recommended actions. This approach is evaluated within CARLA's driving environment under realistic scenarios.

* 6 pages, 5 figures, conference

Via

Access Paper or Ask Questions

FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

Oct 22, 2024

Haoran Lin, Xianzhi Yu, Kang Zhao, Lu Hou, Zongyuan Zhan, Stanislav Kamenev, Han Bao, Ting Hu, Mingkai Wang, Qixin Chang(+10 more)

Figure 1 for FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

Figure 2 for FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

Figure 3 for FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

Figure 4 for FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

Abstract:FlashAttention series has been widely applied in the inference of large language models (LLMs). However, FlashAttention series only supports the high-level GPU architectures, e.g., Ampere and Hopper. At present, FlashAttention series is not easily transferrable to NPUs and low-resource GPUs. Moreover, FlashAttention series is inefficient for multi- NPUs or GPUs inference scenarios. In this work, we propose FastAttention which pioneers the adaptation of FlashAttention series for NPUs and low-resource GPUs to boost LLM inference efficiency. Specifically, we take Ascend NPUs and Volta-based GPUs as representatives for designing our FastAttention. We migrate FlashAttention series to Ascend NPUs by proposing a novel two-level tiling strategy for runtime speedup, tiling-mask strategy for memory saving and the tiling-AllReduce strategy for reducing communication overhead, respectively. Besides, we adapt FlashAttention for Volta-based GPUs by redesigning the operands layout in shared memory and introducing a simple yet effective CPU-GPU cooperative strategy for efficient memory utilization. On Ascend NPUs, our FastAttention can achieve a 10.7$\times$ speedup compared to the standard attention implementation. Llama-7B within FastAttention reaches up to 5.16$\times$ higher throughput than within the standard attention. On Volta architecture GPUs, FastAttention yields 1.43$\times$ speedup compared to its equivalents in \texttt{xformers}. Pangu-38B within FastAttention brings 1.46$\times$ end-to-end speedup using FasterTransformer. Coupled with the propose CPU-GPU cooperative strategy, FastAttention supports a maximal input length of 256K on 8 V100 GPUs. All the codes will be made available soon.

Via

Access Paper or Ask Questions

Scalable Infomin Learning

Feb 21, 2023

Yanzhi Chen, Weihao Sun, Yingzhen Li, Adrian Weller

Abstract:The task of infomin learning aims to learn a representation with high utility while being uninformative about a specified target, with the latter achieved by minimising the mutual information between the representation and the target. It has broad applications, ranging from training fair prediction models against protected attributes, to unsupervised learning with disentangled representations. Recent works on infomin learning mainly use adversarial training, which involves training a neural network to estimate mutual information or its proxy and thus is slow and difficult to optimise. Drawing on recent advances in slicing techniques, we propose a new infomin learning approach, which uses a novel proxy metric to mutual information. We further derive an accurate and analytically computable approximation to this proxy metric, thereby removing the need of constructing neural network-based mutual information estimators. Experiments on algorithmic fairness, disentangled representation learning and domain adaptation verify that our method can effectively remove unwanted information with limited time budget.

* 10 pages, accepted to NeurIPS 2022, slightly improved version

Via

Access Paper or Ask Questions

SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Feb 18, 2020

Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Weihao Sun, Gautam Singh, Fei Deng, Jindong Jiang, Sungjin Ahn

Figure 1 for SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Figure 2 for SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Figure 3 for SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Figure 4 for SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Abstract:The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either based on spatial-attention or scene-mixture approaches and limited in scalability which is a main obstacle towards modeling real-world scenes. In this paper, we propose a generative latent variable model, called SPACE, that provides a unified probabilistic modeling framework that combines the best of spatial-attention and scene-mixture approaches. SPACE can explicitly provide factorized object representations for foreground objects while also decomposing background segments of complex morphology. Previous models are good at either of these, but not both. SPACE also resolves the scalability problems of previous methods by incorporating parallel spatial-attention and thus is applicable to scenes with a large number of objects without performance degradations. We show through experiments on Atari and 3D-Rooms that SPACE achieves the above properties consistently in comparison to SPAIR, IODINE, and GENESIS. Results of our experiments can be found on our project website: https://sites.google.com/view/space-project-page

* Accepted in ICLR 2020

Via

Access Paper or Ask Questions