Abstract:Integrated Sensing and Communication (ISAC), as a fundamental technology of 6G, empowers Vehicle-to-Everything (V2X) systems with enhanced sensing capabilities. One of its promising applications is the reliance on constructed maps for vehicle positioning. Traditional positioning methods primarily rely on Line-of-Sight (LOS), but in urban vehicular scenarios, obstructions often result in predominantly Non-Line-of-Sight (NLOS) conditions. Existing research indicates that NLOS paths, characterized by one-bounce reflection on building walls with determined delay and angle, can support sensing and positioning. However, experimental validation remains insufficient. To address this gap, channel measurements are conducted in an urban street to explore the existence of strong reflected paths in the presence of a vehicle target. The results show significant power contribution from NLOS paths, with large Environmental Objects (EOs) playing a key role in shaping NLOS propagation. Then, a novel model for EO reflection is proposed to extend the Geometry-Based Stochastic Model (GBSM) for ISAC channel standardization. Simulation results validate the model's ability to capture EO's power and position characteristics, showing that higher EO-reflected power and closer distance to Rx reduce Delay Spread (DS), which is more favorable for positioning. This model provides theoretical guidance and empirical support for ISAC positioning algorithms and system design in vehicular scenarios.
Abstract:Comprehensive and consistent dynamic scene understanding from camera input is essential for advanced autonomous systems. Traditional camera-based perception tasks like 3D object tracking and semantic occupancy prediction lack either spatial comprehensiveness or temporal consistency. In this work, we introduce a brand-new task, Camera-based 4D Panoptic Occupancy Tracking, which simultaneously addresses panoptic occupancy segmentation and object tracking from camera-only input. Furthermore, we propose TrackOcc, a cutting-edge approach that processes image inputs in a streaming, end-to-end manner with 4D panoptic queries to address the proposed task. Leveraging the localization-aware loss, TrackOcc enhances the accuracy of 4D panoptic occupancy tracking without bells and whistles. Experimental results demonstrate that our method achieves state-of-the-art performance on the Waymo dataset. The source code will be released at https://github.com/Tsinghua-MARS-Lab/TrackOcc.
Abstract:3D point cloud mapping plays a essential role in localization and autonomous navigation. However, dynamic objects often leave residual traces during the map construction process, which undermine the performance of subsequent tasks. Therefore, dynamic object removal has become a critical challenge in point cloud based map construction within dynamic scenarios. Existing approaches, however, often incur significant computational overhead, making it difficult to meet the real-time processing requirements. To address this issue, we introduce the Height Interval Filtering (HIF) method. This approach constructs pillar-based height interval representations to probabilistically model the vertical dimension, with interval probabilities updated through Bayesian inference. It ensures real-time performance while achieving high accuracy and improving robustness in complex environments. Additionally, we propose a low-height preservation strategy that enhances the detection of unknown spaces, reducing misclassification in areas blocked by obstacles (occluded regions). Experiments on public datasets demonstrate that HIF delivers a 7.7 times improvement in time efficiency with comparable accuracy to existing SOTA methods. The code will be publicly available.
Abstract:Scaling the effective context length is essential for advancing large language models (LLMs) toward artificial general intelligence (AGI). However, the quadratic increase in computational complexity inherent in traditional attention mechanisms presents a prohibitive overhead. Existing approaches either impose strongly biased structures, such as sink or window attention which are task-specific, or radically modify the attention mechanism into linear approximations, whose performance in complex reasoning tasks remains inadequately explored. In this work, we propose a solution that adheres to the ``less structure'' principle, allowing the model to determine where to attend autonomously, rather than introducing predefined biases. We introduce Mixture of Block Attention (MoBA), an innovative approach that applies the principles of Mixture of Experts (MoE) to the attention mechanism. This novel architecture demonstrates superior performance on long-context tasks while offering a key advantage: the ability to seamlessly transition between full and sparse attention, enhancing efficiency without the risk of compromising performance. MoBA has already been deployed to support Kimi's long-context requests and demonstrates significant advancements in efficient attention computation for LLMs. Our code is available at https://github.com/MoonshotAI/MoBA.
Abstract:Many studies have concentrated on constructing supervised models utilizing paired datasets for image denoising, which proves to be expensive and time-consuming. Current self-supervised and unsupervised approaches typically rely on blind-spot networks or sub-image pairs sampling, resulting in pixel information loss and destruction of detailed structural information, thereby significantly constraining the efficacy of such methods. In this paper, we introduce Prompt-SID, a prompt-learning-based single image denoising framework that emphasizes preserving of structural details. This approach is trained in a self-supervised manner using downsampled image pairs. It captures original-scale image information through structural encoding and integrates this prompt into the denoiser. To achieve this, we propose a structural representation generation model based on the latent diffusion process and design a structural attention module within the transformer-based denoiser architecture to decode the prompt. Additionally, we introduce a scale replay training mechanism, which effectively mitigates the scale gap from images of different resolutions. We conduct comprehensive experiments on synthetic, real-world, and fluorescence imaging datasets, showcasing the remarkable effectiveness of Prompt-SID.
Abstract:While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast to previous surveys that focus on a single dimension of responsible LLMs, this survey presents a unified framework that encompasses these diverse dimensions, providing a comprehensive view of enhancing LLMs to better serve real-world applications.
Abstract:Self-supervised video denoising aims to remove noise from videos without relying on ground truth data, leveraging the video itself to recover clean frames. Existing methods often rely on simplistic feature stacking or apply optical flow without thorough analysis. This results in suboptimal utilization of both inter-frame and intra-frame information, and it also neglects the potential of optical flow alignment under self-supervised conditions, leading to biased and insufficient denoising outcomes. To this end, we first explore the practicality of optical flow in the self-supervised setting and introduce a SpatioTemporal Blind-spot Network (STBN) for global frame feature utilization. In the temporal domain, we utilize bidirectional blind-spot feature propagation through the proposed blind-spot alignment block to ensure accurate temporal alignment and effectively capture long-range dependencies. In the spatial domain, we introduce the spatial receptive field expansion module, which enhances the receptive field and improves global perception capabilities. Additionally, to reduce the sensitivity of optical flow estimation to noise, we propose an unsupervised optical flow distillation mechanism that refines fine-grained inter-frame interactions during optical flow alignment. Our method demonstrates superior performance across both synthetic and real-world video denoising datasets. The source code is publicly available at https://github.com/ZKCCZ/STBN.
Abstract:This study focuses on analysis and modeling of the penetration loss of typical building materials in the FR1 (450 MHz-6 GHz) and FR3 (7-24 GHz) bands based on experimental measurements. Firstly, we measure the penetration loss characteristics of four different typical building materials from 4 to 16 GHz, including wood, glass, foam and concrete, by using a penetration loss measurement platform based on the vector network analyzer (VNA). Next, we analyze the frequency dependence and thickness dependence of penetration loss. Finally, the linear model is applied to fit the curve of the measured penetration loss, and new model parameters for the penetration loss of different building materials are given, which are compared with that in the third generation partnership project (3GPP) technical report (TR) 38.901. The analysis results and new model parameters may provides insight into understanding propagation characteristics in FR1 and FR3 bands and 3GPP channel model standardisation.
Abstract:Reconfigurable Intelligent Surface (RIS) is considered as a promising technology for 6G due to its ability to actively modify the electromagnetic propagation environment. Accurate channel modeling is essential for the design and evaluation of RIS assisted communication systems. Most current research models the RIS channel as a cascade of Tx-RIS and RIS-Rx sub-channels. However, most validation efforts regarding this assumption focus on large-scale path loss. To further explore this, in this paper, we derive and extend a convolution expression of RIS cascaded channel model based on the previously proposed Geometry-based Stochastic Model (GBSM)-based RIS cascaded channels. This model follows the 3GPP standard framework and leverages parameters such as angles, delays, and path powers defined in the GBSM model to more accurately reflect the smallscale characteristics of RIS multipath cascades. To verify the accuracy of this model, we conduct measurements of the TxRIS-Rx channel, Tx-RIS, and RIS-Rx sub-channels in a factory environment at 6.9 GHz, using the measured data to demonstrate the models validity and applicability in real-world scenarios. Validation with measured data shows that the proposed model accurately describes the characteristics of the RIS cascaded channel in terms of delay, angle, and power in complex multipath environments, providing important references for the design and deployment of RIS systems.
Abstract:Data quantity and quality are both critical for information extraction and analyzation in remote sensing. However, the current remote sensing datasets often fail to meet these two requirements, for which cloud is a primary factor degrading the data quantity and quality. This limitation affects the precision of results in remote sensing application, particularly those derived from data-driven techniques. In this paper, a physical law embedded generative cloud synthesis method (PGCS) is proposed to generate diverse realistic cloud images to enhance real data and promote the development of algorithms for subsequent tasks, such as cloud correction, cloud detection, and data augmentation for classification, recognition, and segmentation. The PGCS method involves two key phases: spatial synthesis and spectral synthesis. In the spatial synthesis phase, a style-based generative adversarial network is utilized to simulate the spatial characteristics, generating an infinite number of single-channel clouds. In the spectral synthesis phase, the atmospheric scattering law is embedded through a local statistics and global fitting method, converting the single-channel clouds into multi-spectral clouds. The experimental results demonstrate that PGCS achieves a high accuracy in both phases and performs better than three other existing cloud synthesis methods. Two cloud correction methods are developed from PGCS and exhibits a superior performance compared to state-of-the-art methods in the cloud correction task. Furthermore, the application of PGCS with data from various sensors was investigated and successfully extended. Code will be provided at https://github.com/Liying-Xu/PGCS.