Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yunlong Li

TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness

Mar 13, 2025

Mu Chen, Wenyu Chen, Mingchuan Yang, Yuan Zhang, Tao Han, Xinchi Li, Yunlong Li, Huaici Zhao

Abstract:3D semantic occupancy has rapidly become a research focus in the fields of robotics and autonomous driving environment perception due to its ability to provide more realistic geometric perception and its closer integration with downstream tasks. By performing occupancy prediction of the 3D space in the environment, the ability and robustness of scene understanding can be effectively improved. However, existing occupancy prediction tasks are primarily modeled using voxel or point cloud-based approaches: voxel-based network structures often suffer from the loss of spatial information due to the voxelization process, while point cloud-based methods, although better at retaining spatial location information, face limitations in representing volumetric structural details. To address this issue, we propose a dual-modal prediction method based on 3D Gaussian sets and sparse points, which balances both spatial location and volumetric structural information, achieving higher accuracy in semantic occupancy prediction. Specifically, our method adopts a Transformer-based architecture, taking 3D Gaussian sets, sparse points, and queries as inputs. Through the multi-layer structure of the Transformer, the enhanced queries and 3D Gaussian sets jointly contribute to the semantic occupancy prediction, and an adaptive fusion mechanism integrates the semantic outputs of both modalities to generate the final prediction results. Additionally, to further improve accuracy, we dynamically refine the point cloud at each layer, allowing for more precise location information during occupancy prediction. We conducted experiments on the Occ3DnuScenes dataset, and the experimental results demonstrate superior performance of the proposed method on IoU based metrics.

Via

Access Paper or Ask Questions

DMPA: Model Poisoning Attacks on Decentralized Federated Learning for Model Differences

Feb 07, 2025

Chao Feng, Yunlong Li, Yuanzhe Gao, Alberto Huertas Celdrán, Jan von der Assen, Gérôme Bovet, Burkhard Stiller

Figure 1 for DMPA: Model Poisoning Attacks on Decentralized Federated Learning for Model Differences

Figure 2 for DMPA: Model Poisoning Attacks on Decentralized Federated Learning for Model Differences

Figure 3 for DMPA: Model Poisoning Attacks on Decentralized Federated Learning for Model Differences

Figure 4 for DMPA: Model Poisoning Attacks on Decentralized Federated Learning for Model Differences

Abstract:Federated learning (FL) has garnered significant attention as a prominent privacy-preserving Machine Learning (ML) paradigm. Decentralized FL (DFL) eschews traditional FL's centralized server architecture, enhancing the system's robustness and scalability. However, these advantages of DFL also create new vulnerabilities for malicious participants to execute adversarial attacks, especially model poisoning attacks. In model poisoning attacks, malicious participants aim to diminish the performance of benign models by creating and disseminating the compromised model. Existing research on model poisoning attacks has predominantly concentrated on undermining global models within the Centralized FL (CFL) paradigm, while there needs to be more research in DFL. To fill the research gap, this paper proposes an innovative model poisoning attack called DMPA. This attack calculates the differential characteristics of multiple malicious client models and obtains the most effective poisoning strategy, thereby orchestrating a collusive attack by multiple participants. The effectiveness of this attack is validated across multiple datasets, with results indicating that the DMPA approach consistently surpasses existing state-of-the-art FL model poisoning attack strategies.

* 8 pages, 3 figures

Via

Access Paper or Ask Questions

SafeAug: Safety-Critical Driving Data Augmentation from Naturalistic Datasets

Jan 03, 2025

Zhaobin Mo, Yunlong Li, Xuan Di

Figure 1 for SafeAug: Safety-Critical Driving Data Augmentation from Naturalistic Datasets

Figure 2 for SafeAug: Safety-Critical Driving Data Augmentation from Naturalistic Datasets

Figure 3 for SafeAug: Safety-Critical Driving Data Augmentation from Naturalistic Datasets

Figure 4 for SafeAug: Safety-Critical Driving Data Augmentation from Naturalistic Datasets

Abstract:Safety-critical driving data is crucial for developing safe and trustworthy self-driving algorithms. Due to the scarcity of safety-critical data in naturalistic datasets, current approaches primarily utilize simulated or artificially generated images. However, there remains a gap in authenticity between these generated images and naturalistic ones. We propose a novel framework to augment the safety-critical driving data from the naturalistic dataset to address this issue. In this framework, we first detect vehicles using YOLOv5, followed by depth estimation and 3D transformation to simulate vehicle proximity and critical driving scenarios better. This allows for targeted modification of vehicle dynamics data to reflect potentially hazardous situations. Compared to the simulated or artificially generated data, our augmentation methods can generate safety-critical driving data with minimal compromise on image authenticity. Experiments using KITTI datasets demonstrate that a downstream self-driving algorithm trained on this augmented dataset performs superiorly compared to the baselines, which include SMOGN and importance sampling.

Via

Access Paper or Ask Questions

SASE: A Searching Architecture for Squeeze and Excitation Operations

Nov 13, 2024

Hanming Wang, Yunlong Li, Zijun Wu, Huifen Wang, Yuan Zhang

Abstract:In the past few years, channel-wise and spatial-wise attention blocks have been widely adopted as supplementary modules in deep neural networks, enhancing network representational abilities while introducing low complexity. Most attention modules follow a squeeze-and-excitation paradigm. However, to design such attention modules, requires a substantial amount of experiments and computational resources. Neural Architecture Search (NAS), meanwhile, is able to automate the design of neural networks and spares the numerous experiments required for an optimal architecture. This motivates us to design a search architecture that can automatically find near-optimal attention modules through NAS. We propose SASE, a Searching Architecture for Squeeze and Excitation operations, to form a plug-and-play attention block by searching within certain search space. The search space is separated into 4 different sets, each corresponds to the squeeze or excitation operation along the channel or spatial dimension. Additionally, the search sets include not only existing attention blocks but also other operations that have not been utilized in attention mechanisms before. To the best of our knowledge, SASE is the first attempt to subdivide the attention search space and search for architectures beyond currently known attention modules. The searched attention module is tested with extensive experiments across a range of visual tasks. Experimental results indicate that visual backbone networks (ResNet-50/101) using the SASE attention module achieved the best performance compared to those using the current state-of-the-art attention modules. Codes are included in the supplementary material, and they will be made public later.

Via

Access Paper or Ask Questions

GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

Aug 28, 2024

Yongjie Fu, Yunlong Li, Xuan Di

Figure 1 for GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

Figure 2 for GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

Figure 3 for GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

Figure 4 for GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

Abstract:Autonomous driving training requires a diverse range of datasets encompassing various traffic conditions, weather scenarios, and road types. Traditional data augmentation methods often struggle to generate datasets that represent rare occurrences. To address this challenge, we propose GenDDS, a novel approach for generating driving scenarios generation by leveraging the capabilities of Stable Diffusion XL (SDXL), an advanced latent diffusion model. Our methodology involves the use of descriptive prompts to guide the synthesis process, aimed at producing realistic and diverse driving scenarios. With the power of the latest computer vision techniques, such as ControlNet and Hotshot-XL, we have built a complete pipeline for video generation together with SDXL. We employ the KITTI dataset, which includes real-world driving videos, to train the model. Through a series of experiments, we demonstrate that our model can generate high-quality driving videos that closely replicate the complexity and variability of real-world driving scenarios. This research contributes to the development of sophisticated training data for autonomous driving systems and opens new avenues for creating virtual environments for simulation and validation purposes.

Via

Access Paper or Ask Questions

Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Jun 23, 2024

Qiang Gao, Bobo Li, Zixiang Meng, Yunlong Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji

Figure 1 for Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Figure 2 for Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Figure 3 for Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Figure 4 for Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Abstract:Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performance in determining coreference for the events where their argument information relies on long-distance dependencies. In light of these limitations, we propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents. Subsequently, cross-document heterogeneous graphs are constructed and GAT is utilized to learn the representations of events. Finally, a pair scorer calculates the similarity between each pair of events and co-referred events can be recognized using standard clustering algorithm. Additionally, as the existing cross-document event coreference datasets are limited to English, we have developed a large-scale Chinese cross-document event coreference dataset to fill this gap, which comprises 53,066 event mentions and 4,476 clusters. After applying our model on the English and Chinese datasets respectively, it outperforms all baselines by large margins.

* LREC|COLING,Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation,2024,5907-5921

Via

Access Paper or Ask Questions

A Multi-grained based Attention Network for Semi-supervised Sound Event Detection

Jun 21, 2022

Ying Hu, Xiujuan Zhu, Yunlong Li, Hao Huang, Liang He

Figure 1 for A Multi-grained based Attention Network for Semi-supervised Sound Event Detection

Figure 2 for A Multi-grained based Attention Network for Semi-supervised Sound Event Detection

Figure 3 for A Multi-grained based Attention Network for Semi-supervised Sound Event Detection

Figure 4 for A Multi-grained based Attention Network for Semi-supervised Sound Event Detection

Abstract:Sound event detection (SED) is an interesting but challenging task due to the scarcity of data and diverse sound events in real life. This paper presents a multi-grained based attention network (MGA-Net) for semi-supervised sound event detection. To obtain the feature representations related to sound events, a residual hybrid convolution (RH-Conv) block is designed to boost the vanilla convolution's ability to extract the time-frequency features. Moreover, a multi-grained attention (MGA) module is designed to learn temporal resolution features from coarse-level to fine-level. With the MGA module,the network could capture the characteristics of target events with short- or long-duration, resulting in more accurately determining the onset and offset of sound events. Furthermore, to effectively boost the performance of the Mean Teacher (MT) method, a spatial shift (SS) module as a data perturbation mechanism is introduced to increase the diversity of data. Experimental results show that the MGA-Net outperforms the published state-of-the-art competitors, achieving 53.27% and 56.96% event-based macro F1 (EB-F1) score, 0.709 and 0.739 polyphonic sound detection score (PSDS) on the validation and public set respectively.

* INTERSPEECH 2022

Via

Access Paper or Ask Questions

XGBoost energy consumption prediction based on multi-system data HVAC

May 20, 2021

Yunlong Li, Yiming Peng, Dengzheng Zhang, Yingan Mai, Zhengrong Ruan

Figure 1 for XGBoost energy consumption prediction based on multi-system data HVAC

Figure 2 for XGBoost energy consumption prediction based on multi-system data HVAC

Figure 3 for XGBoost energy consumption prediction based on multi-system data HVAC

Figure 4 for XGBoost energy consumption prediction based on multi-system data HVAC

Abstract:The energy consumption of the HVAC system accounts for a significant portion of the energy consumption of the public building system, and using an efficient energy consumption prediction model can assist it in carrying out effective energy-saving transformation. Unlike the traditional energy consumption prediction model, this paper extracts features from large data sets using XGBoost, trains them separately to obtain multiple models, then fuses them with LightGBM's independent prediction results using MAE, infers energy consumption related variables, and successfully applies this model to the self-developed Internet of Things platform.

Via

Access Paper or Ask Questions

An underwater binocular stereo matching algorithm based on the best search domain

Feb 09, 2021

Yimin Peng, Yunlong Li, Zijing Fang

Abstract:Binocular stereo vision is an important branch of machine vision, which imitates the human eye and matches the left and right images captured by the camera based on epipolar constraints. The matched disparity map can be calculated according to the camera imaging model to obtain a depth map, and then the depth map is converted to a point cloud image to obtain spatial point coordinates, thereby achieving the purpose of ranging. However, due to the influence of illumination under water, the captured images no longer meet the epipolar constraints, and the changes in imaging models make traditional calibration methods no longer applicable. Therefore, this paper proposes a new underwater real-time calibration method and a matching method based on the best search domain to improve the accuracy of underwater distance measurement using binoculars.

Via

Access Paper or Ask Questions