Abstract:Amid the burgeoning development of generative models like diffusion models, the task of differentiating synthesized audio from its natural counterpart grows more daunting. Deepfake detection offers a viable solution to combat this challenge. Yet, this defensive measure unintentionally fuels the continued refinement of generative models. Watermarking emerges as a proactive and sustainable tactic, preemptively regulating the creation and dissemination of synthesized content. Thus, this paper, as a pioneer, proposes the generative robust audio watermarking method (Groot), presenting a paradigm for proactively supervising the synthesized audio and its source diffusion models. In this paradigm, the processes of watermark generation and audio synthesis occur simultaneously, facilitated by parameter-fixed diffusion models equipped with a dedicated encoder. The watermark embedded within the audio can subsequently be retrieved by a lightweight decoder. The experimental results highlight Groot's outstanding performance, particularly in terms of robustness, surpassing that of the leading state-of-the-art methods. Beyond its impressive resilience against individual post-processing attacks, Groot exhibits exceptional robustness when facing compound attacks, maintaining an average watermark extraction accuracy of around 95%.
Abstract:The fusion of hyperspectral and LiDAR data has been an active research topic. Existing fusion methods have ignored the high-dimensionality and redundancy challenges in hyperspectral images, despite that band selection methods have been intensively studied for hyperspectral image (HSI) processing. This paper addresses this significant gap by introducing a cross-attention mechanism from the transformer architecture for the selection of HSI bands guided by LiDAR data. LiDAR provides high-resolution vertical structural information, which can be useful in distinguishing different types of land cover that may have similar spectral signatures but different structural profiles. In our approach, the LiDAR data are used as the "query" to search and identify the "key" from the HSI to choose the most pertinent bands for LiDAR. This method ensures that the selected HSI bands drastically reduce redundancy and computational requirements while working optimally with the LiDAR data. Extensive experiments have been undertaken on three paired HSI and LiDAR data sets: Houston 2013, Trento and MUUFL. The results highlight the superiority of the cross-attention mechanism, underlining the enhanced classification accuracy of the identified HSI bands when fused with the LiDAR features. The results also show that the use of fewer bands combined with LiDAR surpasses the performance of state-of-the-art fusion models.
Abstract:Band selection in hyperspectral imaging (HSI) is critical for optimising data processing and enhancing analytical accuracy. Traditional approaches have predominantly concentrated on analysing spectral and pixel characteristics within individual bands independently. These approaches overlook the potential benefits of integrating multiple data sources, such as Light Detection and Ranging (LiDAR), and is further challenged by the limited availability of labeled data in HSI processing, which represents a significant obstacle. To address these challenges, this paper introduces a novel unsupervised band selection framework that incorporates attention mechanisms and an Autoencoder for reconstruction-based band selection. Our methodology distinctively integrates HSI with LiDAR data through an attention score, using a convolutional Autoencoder to process the combined feature mask. This fusion effectively captures essential spatial and spectral features and reduces redundancy in hyperspectral datasets. A comprehensive comparative analysis of our innovative fused band selection approach is performed against existing unsupervised band selection and fusion models. We used data sets such as Houston 2013, Trento, and MUUFLE for our experiments. The results demonstrate that our method achieves superior classification accuracy and significantly outperforms existing models. This enhancement in HSI band selection, facilitated by the incorporation of LiDAR features, underscores the considerable advantages of integrating features from different sources.
Abstract:Classifying hyperspectral images is a difficult task in remote sensing, due to their complex high-dimensional data. To address this challenge, we propose HSIMamba, a novel framework that uses bidirectional reversed convolutional neural network pathways to extract spectral features more efficiently. Additionally, it incorporates a specialized block for spatial analysis. Our approach combines the operational efficiency of CNNs with the dynamic feature extraction capability of attention mechanisms found in Transformers. However, it avoids the associated high computational demands. HSIMamba is designed to process data bidirectionally, significantly enhancing the extraction of spectral features and integrating them with spatial information for comprehensive analysis. This approach improves classification accuracy beyond current benchmarks and addresses computational inefficiencies encountered with advanced models like Transformers. HSIMamba were tested against three widely recognized datasets Houston 2013, Indian Pines, and Pavia University and demonstrated exceptional performance, surpassing existing state-of-the-art models in HSI classification. This method highlights the methodological innovation of HSIMamba and its practical implications, which are particularly valuable in contexts where computational resources are limited. HSIMamba redefines the standards of efficiency and accuracy in HSI classification, thereby enhancing the capabilities of remote sensing applications. Hyperspectral imaging has become a crucial tool for environmental surveillance, agriculture, and other critical areas that require detailed analysis of the Earth surface. Please see our code in HSIMamba for more details.
Abstract:Surface reconstruction from point clouds is a crucial task in the fields of computer vision and computer graphics. SDF-based methods excel at reconstructing smooth meshes with minimal error and artifacts but struggle with representing open surfaces. On the other hand, UDF-based methods can effectively represent open surfaces but often introduce noise near the surface, leading to artifacts in the mesh. In this work, we propose a novel approach that directly predicts the intersection points between sampled line segments of point pairs and implicit surfaces. This method not only preserves the ability to represent open surfaces but also eliminates artifacts in the mesh. Our approach demonstrates state-of-the-art performance on three datasets: ShapeNet, MGN, and ScanNet. The code will be made available upon acceptance.
Abstract:Point cloud reconstruction from raw point cloud has been an important topic in computer graphics for decades, especially due to its high demand in modeling and rendering applications. An important way to solve this problem is establishing a local geometry to fit the local curve. However, previous methods build either a local plane or polynomial curve. Local plane brings the loss of sharp feature and the boundary artefacts on open surface. Polynomial curve is hard to combine with neural network due to the local coordinate consistent problem. To address this, we propose a novel polyhedral surface to represent local surface. This method provides more flexible to represent sharp feature and surface boundary on open surface. It does not require any local coordinate system, which is important when introducing neural networks. Specifically, we use normals to construct the polyhedral surface, including both dihedral and trihedral surfaces using 2 and 3 normals, respectively. Our method achieves state-of-the-art results on three commonly used datasets (ShapeNetCore, ABC, and ScanNet). Code will be released upon acceptance.
Abstract:Although reconfigurable intelligent surfaces (RISs) can improve the performance of wireless networks by smartly reconfiguring the radio environment, existing passive RISs face two key challenges, i.e., double-fading attenuation and dependence on grid/battery. To address these challenges, this paper proposes a new RIS architecture, called multi-functional RIS (MF-RIS). Different from conventional reflecting-only RIS, the proposed MF-RIS is capable of supporting multiple functions with one surface, including signal reflection, amplification, and energy harvesting. As such, our MF-RIS is able to overcome the double-fading attenuation by harvesting energy from incident signals. Through theoretical analysis, we derive the achievable capacity of an MF-RIS-aided communication network. Compared to the capacity achieved by the existing self-sustainable RIS, we derive the number of reflective elements required for MF-RIS to outperform self-sustainable RIS. To realize a self-sustainable communication system, we investigate the use of MF-RIS in improving the sum-rate of multi-user wireless networks. Specifically, we solve a non-convex optimization problem by jointly designing the transmit beamforming and MF-RIS coefficients. As an extension, we investigate a resource allocation problem in a practical scenario with imperfect channel state information. By approximating the semi-infinite constraints with the S-procedure and the general sign-definiteness, we propose a robust beamforming scheme to combat the inevitable channel estimation errors. Finally, numerical results show that: 1) compared to the self-sustainable RIS, MF-RIS can strike a better balance between energy self-sustainability and throughput improvement; and 2) unlike reflecting-only RIS which can be deployed near the transmitter or receiver, MF-RIS should be deployed closer to the transmitter for higher spectrum efficiency.
Abstract:As the number of sensors becomes massive in Internet of Things (IoT) networks, the amount of data is humongous. To process data in real-time while protecting user privacy, federated learning (FL) has been regarded as an enabling technique to push edge intelligence into IoT networks with massive devices. However, FL latency increases dramatically due to the increase of the number of parameters in deep neural network and the limited computation and communication capabilities of IoT devices. To address this issue, we propose a semi-federated learning (SemiFL) paradigm in which network pruning and over-the-air computation are efficiently applied. To be specific, each small base station collects the raw data from its served sensors and trains its local pruned model. After that, the global aggregation of local gradients is achieved through over-the-air computation. We first analyze the performance of the proposed SemiFL by deriving its convergence upper bound. To reduce latency, a convergence-constrained SemiFL latency minimization problem is formulated. By decoupling the original problem into several sub-problems, iterative algorithms are designed to solve them efficiently. Finally, numerical simulations are conducted to verify the effectiveness of our proposed scheme in reducing latency and guaranteeing the identification accuracy.
Abstract:Under the organization of the base station (BS), wireless federated learning (FL) enables collaborative model training among multiple devices. However, the BS is merely responsible for aggregating local updates during the training process, which incurs a waste of the computational resource at the BS. To tackle this issue, we propose a semi-federated learning (SemiFL) paradigm to leverage the computing capabilities of both the BS and devices for a hybrid implementation of centralized learning (CL) and FL. Specifically, each device sends both local gradients and data samples to the BS for training a shared global model. To improve communication efficiency over the same time-frequency resources, we integrate over-the-air computation for aggregation and non-orthogonal multiple access for transmission by designing a novel transceiver structure. To gain deep insights, we conduct convergence analysis by deriving a closed-form optimality gap for SemiFL and extend the result to two extra cases. In the first case, the BS uses all accumulated data samples to calculate the CL gradient, while a decreasing learning rate is adopted in the second case. Our analytical results capture the destructive effect of wireless communication and show that both FL and CL are special cases of SemiFL. Then, we formulate a non-convex problem to reduce the optimality gap by jointly optimizing the transmit power and receive beamformers. Accordingly, we propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers. Extensive simulation results on two real-world datasets corroborate our theoretical analysis, and show that the proposed SemiFL outperforms conventional FL and achieves 3.2% accuracy gain on the MNIST dataset compared to state-of-the-art benchmarks.
Abstract:In this paper, we propose and study a multi-functional reconfigurable intelligent surface (MF-RIS) architecture. In contrast to conventional single-functional RIS (SF-RIS) that only reflects signals, the proposed MF-RIS simultaneously supports multiple functions with one surface, including reflection, refraction, amplification, and energy harvesting of wireless signals. As such, the proposed MF-RIS is capable of significantly enhancing RIS signal coverage by amplifying the signal reflected/refracted by the RIS with the energy harvested. We present the signal model of the proposed MF-RIS, and formulate an optimization problem to maximize the sum-rate of multiple users in an MF-RIS-aided non-orthogonal multiple access network. We jointly optimize the transmit beamforming, power allocations as well as the operating modes and parameters for different elements of the MF-RIS and its deployment location, via an efficient iterative algorithm. Simulation results are provided which show significant performance gains of the MF-RIS over SF-RISs with only some of its functions available. Moreover, we demonstrate that there exists a fundamental trade-off between sum-rate maximization and harvested energy maximization. In contrast to SF-RISs which can be deployed near either the transmitter or receiver, the proposed MF-RIS should be deployed closer to the transmitter for maximizing its communication throughput with more energy harvested.