Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenjing Li

WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting

Dec 25, 2024

Chenghao Qian, Yuhu Guo, Wenjing Li, Gustav Markkula

Figure 1 for WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting

Figure 2 for WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting

Figure 3 for WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting

Figure 4 for WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting

Abstract:3D Gaussian Splatting (3DGS) has gained significant attention for 3D scene reconstruction, but still suffers from complex outdoor environments, especially under adverse weather. This is because 3DGS treats the artifacts caused by adverse weather as part of the scene and will directly reconstruct them, largely reducing the clarity of the reconstructed scene. To address this challenge, we propose WeatherGS, a 3DGS-based framework for reconstructing clear scenes from multi-view images under different weather conditions. Specifically, we explicitly categorize the multi-weather artifacts into the dense particles and lens occlusions that have very different characters, in which the former are caused by snowflakes and raindrops in the air, and the latter are raised by the precipitation on the camera lens. In light of this, we propose a dense-to-sparse preprocess strategy, which sequentially removes the dense particles by an Atmospheric Effect Filter (AEF) and then extracts the relatively sparse occlusion masks with a Lens Effect Detector (LED). Finally, we train a set of 3D Gaussians by the processed images and generated masks for excluding occluded areas, and accurately recover the underlying clear scene by Gaussian splatting. We conduct a diverse and challenging benchmark to facilitate the evaluation of 3D reconstruction under complex weather scenarios. Extensive experiments on this benchmark demonstrate that our WeatherGS consistently produces high-quality, clean scenes across various weather scenarios, outperforming existing state-of-the-art methods. See project page:https://jumponthemoon.github.io/weather-gs.

Via

Access Paper or Ask Questions

Diffusion Models Meet Network Management: Improving Traffic Matrix Analysis with Diffusion-based Approach

Nov 29, 2024

Xinyu Yuan, Yan Qiao, Zhenchun Wei, Zeyu Zhang, Minyue Li, Pei Zhao, Rongyao Hu, Wenjing Li

Abstract:Due to network operation and maintenance relying heavily on network traffic monitoring, traffic matrix analysis has been one of the most crucial issues for network management related tasks. However, it is challenging to reliably obtain the precise measurement in computer networks because of the high measurement cost, and the unavoidable transmission loss. Although some methods proposed in recent years allowed estimating network traffic from partial flow-level or link-level measurements, they often perform poorly for traffic matrix estimation nowadays. Despite strong assumptions like low-rank structure and the prior distribution, existing techniques are usually task-specific and tend to be significantly worse as modern network communication is extremely complicated and dynamic. To address the dilemma, this paper proposed a diffusion-based traffic matrix analysis framework named Diffusion-TM, which leverages problem-agnostic diffusion to notably elevate the estimation performance in both traffic distribution and accuracy. The novel framework not only takes advantage of the powerful generative ability of diffusion models to produce realistic network traffic, but also leverages the denoising process to unbiasedly estimate all end-to-end traffic in a plug-and-play manner under theoretical guarantee. Moreover, taking into account that compiling an intact traffic dataset is usually infeasible, we also propose a two-stage training scheme to make our framework be insensitive to missing values in the dataset. With extensive experiments with real-world datasets, we illustrate the effectiveness of Diffusion-TM on several tasks. Moreover, the results also demonstrate that our method can obtain promising results even with $5\%$ known values left in the datasets.

Via

Access Paper or Ask Questions

Generative AI Enabled Matching for 6G Multiple Access

Oct 29, 2024

Xudong Wang, Hongyang Du, Dusit Niyato, Lijie Zhou, Lei Feng, Zhixiang Yang, Fanqin Zhou, Wenjing Li

Abstract:In wireless networks, applying deep learning models to solve matching problems between different entities has become a mainstream and effective approach. However, the complex network topology in 6G multiple access presents significant challenges for the real-time performance and stability of matching generation. Generative artificial intelligence (GenAI) has demonstrated strong capabilities in graph feature extraction, exploration, and generation, offering potential for graph-structured matching generation. In this paper, we propose a GenAI-enabled matching generation framework to support 6G multiple access. Specifically, we first summarize the classical matching theory, discuss common GenAI models and applications from the perspective of matching generation. Then, we propose a framework based on generative diffusion models (GDMs) that iteratively denoises toward reward maximization to generate a matching strategy that meets specific requirements. Experimental results show that, compared to decision-based AI approaches, our framework can generate more effective matching strategies based on given conditions and predefined rewards, helping to solve complex problems in 6G multiple access, such as task allocation.

* 8 pages,5 figures

Via

Access Paper or Ask Questions

Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery

Oct 24, 2024

Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, Zhun Zhong

Figure 1 for Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery

Figure 2 for Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery

Figure 3 for Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery

Figure 4 for Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery

Abstract:In this paper, we study a practical yet challenging task, On-the-fly Category Discovery (OCD), aiming to online discover the newly-coming stream data that belong to both known and unknown classes, by leveraging only known category knowledge contained in labeled data. Previous OCD methods employ the hash-based technique to represent old/new categories by hash codes for instance-wise inference. However, directly mapping features into low-dimensional hash space not only inevitably damages the ability to distinguish classes and but also causes "high sensitivity" issue, especially for fine-grained classes, leading to inferior performance. To address these issues, we propose a novel Prototypical Hash Encoding (PHE) framework consisting of Category-aware Prototype Generation (CPG) and Discriminative Category Encoding (DCE) to mitigate the sensitivity of hash code while preserving rich discriminative information contained in high-dimension feature space, in a two-stage projection fashion. CPG enables the model to fully capture the intra-category diversity by representing each category with multiple prototypes. DCE boosts the discrimination ability of hash code with the guidance of the generated category prototypes and the constraint of minimum separation distance. By jointly optimizing CPG and DCE, we demonstrate that these two components are mutually beneficial towards an effective OCD. Extensive experiments show the significant superiority of our PHE over previous methods, e.g., obtaining an improvement of +5.3% in ALL ACC averaged on all datasets. Moreover, due to the nature of the interpretable prototypes, we visually analyze the underlying mechanism of how PHE helps group certain samples into either known or unknown categories. Code is available at https://github.com/HaiyangZheng/PHE.

* Accepted by NeurIPS 2024

Via

Access Paper or Ask Questions

WeatherDG: LLM-assisted Procedural Weather Generation for Domain-Generalized Semantic Segmentation

Oct 15, 2024

Chenghao Qian, Yuhu Guo, Yuhong Mo, Wenjing Li

Abstract:In this work, we propose a novel approach, namely WeatherDG, that can generate realistic, weather-diverse, and driving-screen images based on the cooperation of two foundation models, i.e, Stable Diffusion (SD) and Large Language Model (LLM). Specifically, we first fine-tune the SD with source data, aligning the content and layout of generated samples with real-world driving scenarios. Then, we propose a procedural prompt generation method based on LLM, which can enrich scenario descriptions and help SD automatically generate more diverse, detailed images. In addition, we introduce a balanced generation strategy, which encourages the SD to generate high-quality objects of tailed classes under various weather conditions, such as riders and motorcycles. This segmentation-model-agnostic method can improve the generalization ability of existing models by additionally adapting them with the generated synthetic data. Experiments on three challenging datasets show that our method can significantly improve the segmentation performance of different state-of-the-art models on target domains. Notably, in the setting of ''Cityscapes to ACDC'', our method improves the baseline HRDA by 13.9% in mIoU.

Via

Access Paper or Ask Questions

AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

Sep 03, 2024

Chenghao Qian, Mahdi Rezaei, Saeed Anwar, Wenjing Li, Tanveer Hussain, Mohsen Azarmi, Wei Wang

Figure 1 for AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

Figure 2 for AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

Figure 3 for AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

Figure 4 for AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

Abstract:Adverse conditions like snow, rain, nighttime, and fog, pose challenges for autonomous driving perception systems. Existing methods have limited effectiveness in improving essential computer vision tasks, such as semantic segmentation, and often focus on only one specific condition, such as removing rain or translating nighttime images into daytime ones. To address these limitations, we propose a method to improve the visual quality and clarity degraded by such adverse conditions. Our method, AllWeather-Net, utilizes a novel hierarchical architecture to enhance images across all adverse conditions. This architecture incorporates information at three semantic levels: scene, object, and texture, by discriminating patches at each level. Furthermore, we introduce a Scaled Illumination-aware Attention Mechanism (SIAM) that guides the learning towards road elements critical for autonomous driving perception. SIAM exhibits robustness, remaining unaffected by changes in weather conditions or environmental scenes. AllWeather-Net effectively transforms images into normal weather and daytime scenes, demonstrating superior image enhancement results and subsequently enhancing the performance of semantic segmentation, with up to a 5.3% improvement in mIoU in the trained domain. We also show our model's generalization ability by applying it to unseen domains without re-training, achieving up to 3.9% mIoU improvement. Code can be accessed at: https://github.com/Jumponthemoon/AllWeatherNet.

Via

Access Paper or Ask Questions

Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications

Jun 11, 2024

Zhixiang Yang, Hongyang Du, Dusit Niyato, Xudong Wang, Yu Zhou, Lei Feng, Fanqin Zhou, Wenjing Li, Xuesong Qiu

Figure 1 for Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications

Figure 2 for Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications

Figure 3 for Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications

Figure 4 for Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications

Abstract:With the rapid proliferation of mobile devices and data, next-generation wireless communication systems face stringent requirements for ultra-low latency, ultra-high reliability, and massive connectivity. Traditional AI-driven wireless network designs, while promising, often suffer from limitations such as dependency on labeled data and poor generalization. To address these challenges, we present an integration of self-supervised learning (SSL) into wireless networks. SSL leverages large volumes of unlabeled data to train models, enhancing scalability, adaptability, and generalization. This paper offers a comprehensive overview of SSL, categorizing its application scenarios in wireless network optimization and presenting a case study on its impact on semantic communication. Our findings highlight the potentials of SSL to significantly improve wireless network performance without extensive labeled data, paving the way for more intelligent and efficient communication systems.

Via

Access Paper or Ask Questions

Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery

Mar 12, 2024

Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, Zhun Zhong

Abstract:In this paper, we study the problem of Generalized Category Discovery (GCD), which aims to cluster unlabeled data from both known and unknown categories using the knowledge of labeled data from known categories. Current GCD methods rely on only visual cues, which however neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories. To address this, we propose a two-phase TextGCD framework to accomplish multi-modality GCD by exploiting powerful Visual-Language Models. TextGCD mainly includes a retrieval-based text generation (RTG) phase and a cross-modality co-teaching (CCT) phase. First, RTG constructs a visual lexicon using category tags from diverse datasets and attributes from Large Language Models, generating descriptive texts for images in a retrieval manner. Second, CCT leverages disparities between textual and visual modalities to foster mutual learning, thereby enhancing visual GCD. In addition, we design an adaptive class aligning strategy to ensure the alignment of category perceptions between modalities as well as a soft-voting mechanism to integrate multi-modality cues. Experiments on eight datasets show the large superiority of our approach over state-of-the-art methods. Notably, our approach outperforms the best competitor, by 7.7% and 10.8% in All accuracy on ImageNet-1k and CUB, respectively.

Via

Access Paper or Ask Questions

MIFI: MultI-camera Feature Integration for Roust 3D Distracted Driver Activity Recognition

Jan 25, 2024

Jian Kuang, Wenjing Li, Fang Li, Jun Zhang, Zhongcheng Wu

Abstract:Distracted driver activity recognition plays a critical role in risk aversion-particularly beneficial in intelligent transportation systems. However, most existing methods make use of only the video from a single view and the difficulty-inconsistent issue is neglected. Different from them, in this work, we propose a novel MultI-camera Feature Integration (MIFI) approach for 3D distracted driver activity recognition by jointly modeling the data from different camera views and explicitly re-weighting examples based on their degree of difficulty. Our contributions are two-fold: (1) We propose a simple but effective multi-camera feature integration framework and provide three types of feature fusion techniques. (2) To address the difficulty-inconsistent problem in distracted driver activity recognition, a periodic learning method, named example re-weighting that can jointly learn the easy and hard samples, is presented. The experimental results on the 3MDAD dataset demonstrate that the proposed MIFI can consistently boost performance compared to single-view models.

* Accepted by IEEE Transactions on Intelligent Transportation Systems. Minor typos have been fixed in Table IV

Via

Access Paper or Ask Questions

Democratizing Fine-grained Visual Recognition with Large Language Models

Jan 24, 2024

Mingxuan Liu, Subhankar Roy, Wenjing Li, Zhun Zhong, Nicu Sebe, Elisa Ricci

Figure 1 for Democratizing Fine-grained Visual Recognition with Large Language Models

Figure 2 for Democratizing Fine-grained Visual Recognition with Large Language Models

Figure 3 for Democratizing Fine-grained Visual Recognition with Large Language Models

Figure 4 for Democratizing Fine-grained Visual Recognition with Large Language Models

Abstract:Identifying subordinate-level categories from images is a longstanding task in computer vision and is referred to as fine-grained visual recognition (FGVR). It has tremendous significance in real-world applications since an average layperson does not excel at differentiating species of birds or mushrooms due to subtle differences among the species. A major bottleneck in developing FGVR systems is caused by the need of high-quality paired expert annotations. To circumvent the need of expert knowledge we propose Fine-grained Semantic Category Reasoning (FineR) that internally leverages the world knowledge of large language models (LLMs) as a proxy in order to reason about fine-grained category names. In detail, to bridge the modality gap between images and LLM, we extract part-level visual attributes from images as text and feed that information to a LLM. Based on the visual attributes and its internal world knowledge the LLM reasons about the subordinate-level category names. Our training-free FineR outperforms several state-of-the-art FGVR and language and vision assistant models and shows promise in working in the wild and in new domains where gathering expert annotation is arduous.

* Accepted as a conference paper at ICLR 2024

Via

Access Paper or Ask Questions