Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nan Song

RealEngine: Simulating Autonomous Driving in Realistic Context

May 22, 2025

Junzhe Jiang, Nan Song, Jingyu Li, Xiatian Zhu, Li Zhang

Abstract:Driving simulation plays a crucial role in developing reliable driving agents by providing controlled, evaluative environments. To enable meaningful assessments, a high-quality driving simulator must satisfy several key requirements: multi-modal sensing capabilities (e.g., camera and LiDAR) with realistic scene rendering to minimize observational discrepancies; closed-loop evaluation to support free-form trajectory behaviors; highly diverse traffic scenarios for thorough evaluation; multi-agent cooperation to capture interaction dynamics; and high computational efficiency to ensure affordability and scalability. However, existing simulators and benchmarks fail to comprehensively meet these fundamental criteria. To bridge this gap, this paper introduces RealEngine, a novel driving simulation framework that holistically integrates 3D scene reconstruction and novel view synthesis techniques to achieve realistic and flexible closed-loop simulation in the driving context. By leveraging real-world multi-modal sensor data, RealEngine reconstructs background scenes and foreground traffic participants separately, allowing for highly diverse and realistic traffic scenarios through flexible scene composition. This synergistic fusion of scene reconstruction and view synthesis enables photorealistic rendering across multiple sensor modalities, ensuring both perceptual fidelity and geometric accuracy. Building upon this environment, RealEngine supports three essential driving simulation categories: non-reactive simulation, safety testing, and multi-agent interaction, collectively forming a reliable and comprehensive benchmark for evaluating the real-world performance of driving agents.

Via

Access Paper or Ask Questions

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

Mar 18, 2025

Bozhou Zhang, Nan Song, Xin Jin, Li Zhang

Abstract:End-to-end autonomous driving unifies tasks in a differentiable framework, enabling planning-oriented optimization and attracting growing attention. Current methods aggregate historical information either through dense historical bird's-eye-view (BEV) features or by querying a sparse memory bank, following paradigms inherited from detection. However, we argue that these paradigms either omit historical information in motion planning or fail to align with its multi-step nature, which requires predicting or planning multiple future time steps. In line with the philosophy of future is a continuation of past, we propose BridgeAD, which reformulates motion and planning queries as multi-step queries to differentiate the queries for each future time step. This design enables the effective use of historical prediction and planning by applying them to the appropriate parts of the end-to-end system based on the time steps, which improves both perception and motion planning. Specifically, historical queries for the current frame are combined with perception, while queries for future frames are integrated with motion planning. In this way, we bridge the gap between past and future by aggregating historical insights at every time step, enhancing the overall coherence and accuracy of the end-to-end autonomous driving pipeline. Extensive experiments on the nuScenes dataset in both open-loop and closed-loop settings demonstrate that BridgeAD achieves state-of-the-art performance.

* CVPR 2025

Via

Access Paper or Ask Questions

Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion

Nov 08, 2024

Nan Song, Xiaofeng Yang, Ze Yang, Guosheng Lin

Figure 1 for Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion

Figure 2 for Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion

Figure 3 for Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion

Figure 4 for Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion

Abstract:Lifelong few-shot customization for text-to-image diffusion aims to continually generalize existing models for new tasks with minimal data while preserving old knowledge. Current customization diffusion models excel in few-shot tasks but struggle with catastrophic forgetting problems in lifelong generations. In this study, we identify and categorize the catastrophic forgetting problems into two folds: relevant concepts forgetting and previous concepts forgetting. To address these challenges, we first devise a data-free knowledge distillation strategy to tackle relevant concepts forgetting. Unlike existing methods that rely on additional real data or offline replay of original concept data, our approach enables on-the-fly knowledge distillation to retain the previous concepts while learning new ones, without accessing any previous data. Second, we develop an In-Context Generation (ICGen) paradigm that allows the diffusion model to be conditioned upon the input vision context, which facilitates the few-shot generation and mitigates the issue of previous concepts forgetting. Extensive experiments show that the proposed Lifelong Few-Shot Diffusion (LFS-Diffusion) method can produce high-quality and accurate images while maintaining previously learned knowledge.

Via

Access Paper or Ask Questions

Motion Forecasting in Continuous Driving

Oct 08, 2024

Nan Song, Bozhou Zhang, Xiatian Zhu, Li Zhang

Figure 1 for Motion Forecasting in Continuous Driving

Figure 2 for Motion Forecasting in Continuous Driving

Figure 3 for Motion Forecasting in Continuous Driving

Figure 4 for Motion Forecasting in Continuous Driving

Abstract:Motion forecasting for agents in autonomous driving is highly challenging due to the numerous possibilities for each agent's next action and their complex interactions in space and time. In real applications, motion forecasting takes place repeatedly and continuously as the self-driving car moves. However, existing forecasting methods typically process each driving scene within a certain range independently, totally ignoring the situational and contextual relationships between successive driving scenes. This significantly simplifies the forecasting task, making the solutions suboptimal and inefficient to use in practice. To address this fundamental limitation, we propose a novel motion forecasting framework for continuous driving, named RealMotion. It comprises two integral streams both at the scene level: (1) The scene context stream progressively accumulates historical scene information until the present moment, capturing temporal interactive relationships among scene elements. (2) The agent trajectory stream optimizes current forecasting by sequentially relaying past predictions. Besides, a data reorganization strategy is introduced to narrow the gap between existing benchmarks and real-world applications, consistent with our network. These approaches enable exploiting more broadly the situational and progressive insights of dynamic motion across space and time. Extensive experiments on Argoverse series with different settings demonstrate that our RealMotion achieves state-of-the-art performance, along with the advantage of efficient real-world inference. The source code will be available at https://github.com/fudan-zvg/RealMotion.

* Accepted at NeurIPS 2024 Spotlight

Via

Access Paper or Ask Questions

DeMo: Decoupling Motion Forecasting into Directional Intentions and Dynamic States

Oct 08, 2024

Bozhou Zhang, Nan Song, Li Zhang

Abstract:Accurate motion forecasting for traffic agents is crucial for ensuring the safety and efficiency of autonomous driving systems in dynamically changing environments. Mainstream methods adopt a one-query-one-trajectory paradigm, where each query corresponds to a unique trajectory for predicting multi-modal trajectories. While straightforward and effective, the absence of detailed representation of future trajectories may yield suboptimal outcomes, given that the agent states dynamically evolve over time. To address this problem, we introduce DeMo, a framework that decouples multi-modal trajectory queries into two types: mode queries capturing distinct directional intentions and state queries tracking the agent's dynamic states over time. By leveraging this format, we separately optimize the multi-modality and dynamic evolutionary properties of trajectories. Subsequently, the mode and state queries are integrated to obtain a comprehensive and detailed representation of the trajectories. To achieve these operations, we additionally introduce combined Attention and Mamba techniques for global information aggregation and state sequence modeling, leveraging their respective strengths. Extensive experiments on both the Argoverse 2 and nuScenes benchmarks demonstrate that our DeMo achieves state-of-the-art performance in motion forecasting.

* NeurIPS 2024

Via

Access Paper or Ask Questions

DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

Aug 09, 2024

Zeyu Yang, Nan Song, Wei Li, Xiatian Zhu, Li Zhang, Philip H. S. Torr

Figure 1 for DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

Figure 2 for DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

Figure 3 for DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

Figure 4 for DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

Abstract:Existing top-performance autonomous driving systems typically rely on the multi-modal fusion strategy for reliable scene understanding. This design is however fundamentally restricted due to overlooking the modality-specific strengths and finally hampering the model performance. To address this limitation, in this work, we introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout, enabling their unique characteristics to be exploited during the whole perception pipeline. To demonstrate the effectiveness of the proposed strategy, we design DeepInteraction++, a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Specifically, the encoder is implemented as a dual-stream Transformer with specialized attention operation for information exchange and integration between separate modality-specific representations. Our multi-modal representational learning incorporates both object-centric, precise sampling-based feature alignment and global dense information spreading, essential for the more challenging planning task. The decoder is designed to iteratively refine the predictions by alternately aggregating information from separate representations in a unified modality-agnostic manner, realizing multi-modal predictive interaction. Extensive experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks. Our code is available at https://github.com/fudan-zvg/DeepInteraction.

* Journal extension of NeurIPS 2022. arXiv admin note: text overlap with arXiv:2208.11112

Via

Access Paper or Ask Questions

Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI

Nov 17, 2023

Xiatian Zhang, Sisi Zheng, Hubert P. H. Shum, Haozheng Zhang, Nan Song, Mingkang Song, Hongxiao Jia

Figure 1 for Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI

Figure 2 for Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI

Figure 3 for Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI

Figure 4 for Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI

Abstract:Resting-state fMRI (rs-fMRI) functional connectivity (FC) analysis provides valuable insights into the relationships between different brain regions and their potential implications for neurological or psychiatric disorders. However, specific design efforts to predict treatment response from rs-fMRI remain limited due to difficulties in understanding the current brain state and the underlying mechanisms driving the observed patterns, which limited the clinical application of rs-fMRI. To overcome that, we propose a graph learning framework that captures comprehensive features by integrating both correlation and distance-based similarity measures under a contrastive loss. This approach results in a more expressive framework that captures brain dynamic features at different scales and enables more accurate prediction of treatment response. Our experiments on the chronic pain and depersonalization disorder datasets demonstrate that our proposed method outperforms current methods in different scenarios. To the best of our knowledge, we are the first to explore the integration of distance-based and correlation-based neural similarity into graph learning for treatment response prediction.

* Proceedings of the 2023 International Conference on Neural Information Processing (ICONIP)

Via

Access Paper or Ask Questions

MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis

Jun 29, 2023

Hongjie Cai, Nan Song, Zengzhi Wang, Qiming Xie, Qiankun Zhao, Ke Li, Siwei Wu, Shijie Liu, Jianfei Yu, Rui Xia

Figure 1 for MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis

Figure 2 for MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis

Figure 3 for MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis

Figure 4 for MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis

Abstract:Aspect-based sentiment analysis is a long-standing research interest in the field of opinion mining, and in recent years, researchers have gradually shifted their focus from simple ABSA subtasks to end-to-end multi-element ABSA tasks. However, the datasets currently used in the research are limited to individual elements of specific tasks, usually focusing on in-domain settings, ignoring implicit aspects and opinions, and with a small data scale. To address these issues, we propose a large-scale Multi-Element Multi-Domain dataset (MEMD) that covers the four elements across five domains, including nearly 20,000 review sentences and 30,000 quadruples annotated with explicit and implicit aspects and opinions for ABSA research. Meanwhile, we evaluate generative and non-generative baselines on multiple ABSA subtasks under the open domain setting, and the results show that open domain ABSA as well as mining implicit aspects and opinions remain ongoing challenges to be addressed. The datasets are publicly released at \url{https://github.com/NUSTM/MEMD-ABSA}.

Via

Access Paper or Ask Questions

Few-shot Open-set Recognition Using Background as Unknowns

Jul 19, 2022

Nan Song, Chi Zhang, Guosheng Lin

Figure 1 for Few-shot Open-set Recognition Using Background as Unknowns

Figure 2 for Few-shot Open-set Recognition Using Background as Unknowns

Figure 3 for Few-shot Open-set Recognition Using Background as Unknowns

Figure 4 for Few-shot Open-set Recognition Using Background as Unknowns

Abstract:Few-shot open-set recognition aims to classify both seen and novel images given only limited training data of seen classes. The challenge of this task is that the model is required not only to learn a discriminative classifier to classify the pre-defined classes with few training data but also to reject inputs from unseen classes that never appear at training time. In this paper, we propose to solve the problem from two novel aspects. First, instead of learning the decision boundaries between seen classes, as is done in standard close-set classification, we reserve space for unseen classes, such that images located in these areas are recognized as the unseen classes. Second, to effectively learn such decision boundaries, we propose to utilize the background features from seen classes. As these background regions do not significantly contribute to the decision of close-set classification, it is natural to use them as the pseudo unseen classes for classifier learning. Our extensive experiments show that our proposed method not only outperforms multiple baselines but also sets new state-of-the-art results on three popular benchmarks, namely tieredImageNet, miniImageNet, and Caltech-USCD Birds-200-2011 (CUB).

* Accpeted to ACM MM 2022

Via

Access Paper or Ask Questions

Few-Shot Incremental Learning with Continually Evolved Classifiers

Apr 07, 2021

Chi Zhang, Nan Song, Guosheng Lin, Yun Zheng, Pan Pan, Yinghui Xu

Figure 1 for Few-Shot Incremental Learning with Continually Evolved Classifiers

Figure 2 for Few-Shot Incremental Learning with Continually Evolved Classifiers

Figure 3 for Few-Shot Incremental Learning with Continually Evolved Classifiers

Figure 4 for Few-Shot Incremental Learning with Continually Evolved Classifiers

Abstract:Few-shot class-incremental learning (FSCIL) aims to design machine learning algorithms that can continually learn new concepts from a few data points, without forgetting knowledge of old classes. The difficulty lies in that limited data from new classes not only lead to significant overfitting issues but also exacerbate the notorious catastrophic forgetting problems. Moreover, as training data come in sequence in FSCIL, the learned classifier can only provide discriminative information in individual sessions, while FSCIL requires all classes to be involved for evaluation. In this paper, we address the FSCIL problem from two aspects. First, we adopt a simple but effective decoupled learning strategy of representations and classifiers that only the classifiers are updated in each incremental session, which avoids knowledge forgetting in the representations. By doing so, we demonstrate that a pre-trained backbone plus a non-parametric class mean classifier can beat state-of-the-art methods. Second, to make the classifiers learned on individual sessions applicable to all classes, we propose a Continually Evolved Classifier (CEC) that employs a graph model to propagate context information between classifiers for adaptation. To enable the learning of CEC, we design a pseudo incremental learning paradigm that episodically constructs a pseudo incremental learning task to optimize the graph parameters by sampling data from the base dataset. Experiments on three popular benchmark datasets, including CIFAR100, miniImageNet, and Caltech-USCD Birds-200-2011 (CUB200), show that our method significantly outperforms the baselines and sets new state-of-the-art results with remarkable advantages.

* Accpeted to CVPR2021

Via

Access Paper or Ask Questions