Abstract:Self-supervised video denoising aims to remove noise from videos without relying on ground truth data, leveraging the video itself to recover clean frames. Existing methods often rely on simplistic feature stacking or apply optical flow without thorough analysis. This results in suboptimal utilization of both inter-frame and intra-frame information, and it also neglects the potential of optical flow alignment under self-supervised conditions, leading to biased and insufficient denoising outcomes. To this end, we first explore the practicality of optical flow in the self-supervised setting and introduce a SpatioTemporal Blind-spot Network (STBN) for global frame feature utilization. In the temporal domain, we utilize bidirectional blind-spot feature propagation through the proposed blind-spot alignment block to ensure accurate temporal alignment and effectively capture long-range dependencies. In the spatial domain, we introduce the spatial receptive field expansion module, which enhances the receptive field and improves global perception capabilities. Additionally, to reduce the sensitivity of optical flow estimation to noise, we propose an unsupervised optical flow distillation mechanism that refines fine-grained inter-frame interactions during optical flow alignment. Our method demonstrates superior performance across both synthetic and real-world video denoising datasets. The source code is publicly available at https://github.com/ZKCCZ/STBN.
Abstract:This study focuses on analysis and modeling of the penetration loss of typical building materials in the FR1 (450 MHz-6 GHz) and FR3 (7-24 GHz) bands based on experimental measurements. Firstly, we measure the penetration loss characteristics of four different typical building materials from 4 to 16 GHz, including wood, glass, foam and concrete, by using a penetration loss measurement platform based on the vector network analyzer (VNA). Next, we analyze the frequency dependence and thickness dependence of penetration loss. Finally, the linear model is applied to fit the curve of the measured penetration loss, and new model parameters for the penetration loss of different building materials are given, which are compared with that in the third generation partnership project (3GPP) technical report (TR) 38.901. The analysis results and new model parameters may provides insight into understanding propagation characteristics in FR1 and FR3 bands and 3GPP channel model standardisation.
Abstract:Reconfigurable Intelligent Surface (RIS) is considered as a promising technology for 6G due to its ability to actively modify the electromagnetic propagation environment. Accurate channel modeling is essential for the design and evaluation of RIS assisted communication systems. Most current research models the RIS channel as a cascade of Tx-RIS and RIS-Rx sub-channels. However, most validation efforts regarding this assumption focus on large-scale path loss. To further explore this, in this paper, we derive and extend a convolution expression of RIS cascaded channel model based on the previously proposed Geometry-based Stochastic Model (GBSM)-based RIS cascaded channels. This model follows the 3GPP standard framework and leverages parameters such as angles, delays, and path powers defined in the GBSM model to more accurately reflect the smallscale characteristics of RIS multipath cascades. To verify the accuracy of this model, we conduct measurements of the TxRIS-Rx channel, Tx-RIS, and RIS-Rx sub-channels in a factory environment at 6.9 GHz, using the measured data to demonstrate the models validity and applicability in real-world scenarios. Validation with measured data shows that the proposed model accurately describes the characteristics of the RIS cascaded channel in terms of delay, angle, and power in complex multipath environments, providing important references for the design and deployment of RIS systems.
Abstract:Data quantity and quality are both critical for information extraction and analyzation in remote sensing. However, the current remote sensing datasets often fail to meet these two requirements, for which cloud is a primary factor degrading the data quantity and quality. This limitation affects the precision of results in remote sensing application, particularly those derived from data-driven techniques. In this paper, a physical law embedded generative cloud synthesis method (PGCS) is proposed to generate diverse realistic cloud images to enhance real data and promote the development of algorithms for subsequent tasks, such as cloud correction, cloud detection, and data augmentation for classification, recognition, and segmentation. The PGCS method involves two key phases: spatial synthesis and spectral synthesis. In the spatial synthesis phase, a style-based generative adversarial network is utilized to simulate the spatial characteristics, generating an infinite number of single-channel clouds. In the spectral synthesis phase, the atmospheric scattering law is embedded through a local statistics and global fitting method, converting the single-channel clouds into multi-spectral clouds. The experimental results demonstrate that PGCS achieves a high accuracy in both phases and performs better than three other existing cloud synthesis methods. Two cloud correction methods are developed from PGCS and exhibits a superior performance compared to state-of-the-art methods in the cloud correction task. Furthermore, the application of PGCS with data from various sensors was investigated and successfully extended. Code will be provided at https://github.com/Liying-Xu/PGCS.
Abstract:The air interface technology plays a crucial role in optimizing the communication quality for users. To address the challenges brought by the radio channel variations to air interface design, this article proposes a framework of wireless environment information-aided 6G AI-enabled air interface (WEI-6G AI$^{2}$), which actively acquires real-time environment details to facilitate channel fading prediction and communication technology optimization. Specifically, we first outline the role of WEI in supporting the 6G AI$^{2}$ in scenario adaptability, real-time inference, and proactive action. Then, WEI is delineated into four progressive steps: raw sensing data, features obtained by data dimensionality reduction, semantics tailored to tasks, and knowledge that quantifies the environmental impact on the channel. To validate the availability and compare the effect of different types of WEI, a path loss prediction use case is designed. The results demonstrate that leveraging environment knowledge requires only 2.2 ms of model inference time, which can effectively support real-time design for future 6G AI$^{2}$. Additionally, WEI can reduce the pilot overhead by 25\%. Finally, several open issues are pointed out, including multi-modal sensing data synchronization and information extraction method construction.
Abstract:Extracting typical operational scenarios is essential for making flexible decisions in the dispatch of a new power system. This study proposed a novel deep time series aggregation scheme (DTSAs) to generate typical operational scenarios, considering the large amount of historical operational snapshot data. Specifically, DTSAs analyze the intrinsic mechanisms of different scheduling operational scenario switching to mathematically represent typical operational scenarios. A gramian angular summation field (GASF) based operational scenario image encoder was designed to convert operational scenario sequences into high-dimensional spaces. This enables DTSAs to fully capture the spatiotemporal characteristics of new power systems using deep feature iterative aggregation models. The encoder also facilitates the generation of typical operational scenarios that conform to historical data distributions while ensuring the integrity of grid operational snapshots. Case studies demonstrate that the proposed method extracted new fine-grained power system dispatch schemes and outperformed the latest high-dimensional featurescreening methods. In addition, experiments with different new energy access ratios were conducted to verify the robustness of the proposed method. DTSAs enables dispatchers to master the operation experience of the power system in advance, and actively respond to the dynamic changes of the operation scenarios under the high access rate of new energy.
Abstract:Integrating device-to-device (D2D) communication into cellular networks can significantly reduce the transmission burden on base stations (BSs). Besides, integrated sensing and communication (ISAC) is envisioned as a key feature in future wireless networks. In this work, we consider a full-duplex ISAC- based D2D underlaid system, and propose a joint beamforming and power allocation scheme to improve the performance of the coexisting ISAC and D2D networks. To enhance spectral efficiency, a sum rate maximization problem is formulated for the full-duplex ISAC-based D2D underlaid system, which is non-convex. To solve the non-convex optimization problem, we propose a successive convex approximation (SCA)-based iterative algorithm and prove its convergence. Numerical results are provided to validate the effectiveness of the proposed scheme with the iterative algorithm, demonstrating that the proposed scheme outperforms state-of-the-art ones in both communication and sensing performance.
Abstract:The increasing parameters and expansive dataset of large language models (LLMs) highlight the urgent demand for a technical solution to audit the underlying privacy risks and copyright issues associated with LLMs. Existing studies have partially addressed this need through an exploration of the pre-training data detection problem, which is an instance of a membership inference attack (MIA). This problem involves determining whether a given piece of text has been used during the pre-training phase of the target LLM. Although existing methods have designed various sophisticated MIA score functions to achieve considerable detection performance in pre-trained LLMs, how to achieve high-confidence detection and how to perform MIA on aligned LLMs remain challenging. In this paper, we propose MIA-Tuner, a novel instruction-based MIA method, which instructs LLMs themselves to serve as a more precise pre-training data detector internally, rather than design an external MIA score function. Furthermore, we design two instruction-based safeguards to respectively mitigate the privacy risks brought by the existing methods and MIA-Tuner. To comprehensively evaluate the most recent state-of-the-art LLMs, we collect a more up-to-date MIA benchmark dataset, named WIKIMIA-24, to replace the widely adopted benchmark WIKIMIA. We conduct extensive experiments across various aligned and unaligned LLMs over the two benchmark datasets. The results demonstrate that MIA-Tuner increases the AUC of MIAs from 0.7 to a significantly high level of 0.9.
Abstract:Whole-body pose estimation is a challenging task that requires simultaneous prediction of keypoints for the body, hands, face, and feet. Whole-body pose estimation aims to predict fine-grained pose information for the human body, including the face, torso, hands, and feet, which plays an important role in the study of human-centric perception and generation and in various applications. In this work, we present RTMW (Real-Time Multi-person Whole-body pose estimation models), a series of high-performance models for 2D/3D whole-body pose estimation. We incorporate RTMPose model architecture with FPN and HEM (Hierarchical Encoding Module) to better capture pose information from different body parts with various scales. The model is trained with a rich collection of open-source human keypoint datasets with manually aligned annotations and further enhanced via a two-stage distillation strategy. RTMW demonstrates strong performance on multiple whole-body pose estimation benchmarks while maintaining high inference efficiency and deployment friendliness. We release three sizes: m/l/x, with RTMW-l achieving a 70.2 mAP on the COCO-Wholebody benchmark, making it the first open-source model to exceed 70 mAP on this benchmark. Meanwhile, we explored the performance of RTMW in the task of 3D whole-body pose estimation, conducting image-based monocular 3D whole-body pose estimation in a coordinate classification manner. We hope this work can benefit both academic research and industrial applications. The code and models have been made publicly available at: https://github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
Abstract:As the underlying foundation of a digital twin network (DTN), a digital twin channel (DTC) can accurately depict the process of radio propagation in the air interface to support the DTN-based 6G wireless network. Since radio propagation is affected by the environment, constructing the relationship between the environment and radio wave propagation is the key to improving the accuracy of DTC, and the construction method based on artificial intelligence (AI) is the most concentrated. However, in the existing methods, the environment information input into the neural network (NN) has many dimensions, and the correlation between the environment and the channel relationship is unclear, resulting in a highly complex relationship construction process. To solve this issue, in this paper, we propose a construction method of radio environment knowledge (REK) inspired by the electromagnetic wave property to quantify the contribution of radio propagation. Specifically, a range selection scheme for effective environment information based on random geometry is proposed to reduce the redundancy of environment information. We quantify the contribution of radio propagation reflection, diffraction and scatterer blockage using environment information and propose a flow chart of REK construction to replace the feature extraction process partially based on NN. To validate REK's effectiveness, we conduct a path loss prediction task based on a lightweight convolutional neural network (CNN) employing a simple two-layer convolutional structure. The results show that the accuracy of the range selection method reaches 90\%; the constructed REK maintains the prediction error of 0.3 and only needs 0.04 seconds of testing time, effectively reducing the network complexity.