School of Mathematical Science, Inner Mongolia University, Hohhot, China
Abstract:Orthogonal time frequency space (OTFS) modulation has been viewed as a promising technique for integrated sensing and communication (ISAC) systems and aerial-terrestrial networks, due to its delay-Doppler domain transmission property and strong Doppler-resistance capability. However, it also suffers from high processing complexity at the receiver. In this work, we propose a novel pre-equalization based ISAC-OTFS transmission framework, where the terrestrial base station (BS) executes pre-equalization based on its estimated channel state information (CSI). In particular, the mean square error of OTFS symbol demodulation and Cramer-Rao lower bound of sensing parameter estimation are derived, and their weighted sum is utilized as the metric for optimizing the pre-equalization matrix. To address the formulated problem while taking the time-varying CSI into consideration, a deep learning enabled channel prediction-based pre-equalization framework is proposed, where a parameter-level channel prediction module is utilized to decouple OTFS channel parameters, and a low-dimensional prediction network is leveraged to correct outdated CSI. A CSI processing module is then used to initialize the input of the pre-equalization module. Finally, a residual-structured deep neural network is cascaded to execute pre-equalization. Simulation results show that under the proposed framework, the demodulation complexity at the receiver as well as the pilot overhead for channel estimation, are significantly reduced, while the symbol detection performance approaches those of conventional minimum mean square error equalization and perfect CSI.
Abstract:Among the key enabling 6G techniques, multiple-input multiple-output (MIMO) and non-orthogonal multiple-access (NOMA) play an important role in enhancing the spectral efficiency of the wireless communication systems. To further extend the coverage and the capacity, the simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) has recently emerged out as a cost-effective technology. To exploit the benefit of STAR-RIS in the MIMO-NOMA systems, in this paper, we investigate the analysis and optimization of the downlink dual-user MIMO-NOMA systems assisted by multiple STAR-RISs under the generalized singular value decomposition (GSVD) precoding scheme, in which the channel is assumed to be Rician faded with the Weichselberger's correlation structure. To analyze the asymptotic information rate of the users, we apply the operator-valued free probability theory to obtain the Cauchy transform of the generalized singular values (GSVs) of the MIMO-NOMA channel matrices, which can be used to obtain the information rate by Riemann integral. Then, considering the special case when the channels between the BS and the STAR-RISs are deterministic, we obtain the closed-form expression for the asymptotic information rates of the users. Furthermore, a projected gradient ascent method (PGAM) is proposed with the derived closed-form expression to design the STAR-RISs thereby maximizing the sum rate based on the statistical channel state information. The numerical results show the accuracy of the asymptotic expression compared to the Monte Carlo simulations and the superiority of the proposed PGAM algorithm.
Abstract:Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesized using state-of-the-art methods from publicly accessible singing voice datasets. CtrSVDD includes 47.64 hours of bonafide and 260.34 hours of deepfake singing vocals, spanning 14 deepfake methods and involving 164 singer identities. We also present a baseline system with flexible front-end features, evaluated against a structured train/dev/eval split. The experiments show the importance of feature selection and highlight a need for generalization towards deepfake methods that deviate further from training distribution. The CtrSVDD dataset and baselines are publicly accessible.
Abstract:Wrinkle defects were found widely exist in the field of industrial products, i.e. wind turbine blades and filament-wound composite pressure vessels. The magnitude of wrinkle wavelength varies from several millimeters to over one hundred millimeters. Locating the wrinkle defects and measuring their responses are very important to the assessment of the structures that containing wrinkle defects. A meso-mechanical modeling is presented based on the homogenization method to obtain the effective stiffness of a graded wrinkle. The finite element simulation predicts the trans-scale response of out-of-plane displacement of wrinkled laminates, where the maximum displacement ranges from nanoscale to millimeter scale. Such trans-scale effect requires different measurement approaches to observe the displacement responses. Here we employed Shearography (Speckle Pattern Shearing Interferometry) and fringe projection profilometry (FPP) method respectively according to the different magnitude of displacement. In FPP method, a displacement extraction algorithm was presented to obtain the out-of-plane displacement. The measurement sensitivity and accuracy of Shearography and FPP are compared, which provides a quantitative reference for industrial non-destructive test.
Abstract:In the evolving field of corporate sustainability, analyzing unstructured Environmental, Social, and Governance (ESG) reports is a complex challenge due to their varied formats and intricate content. This study introduces an innovative methodology utilizing the "Unstructured Core Library", specifically tailored to address these challenges by transforming ESG reports into structured, analyzable formats. Our approach significantly advances the existing research by offering high-precision text cleaning, adept identification and extraction of text from images, and standardization of tables within these reports. Emphasizing its capability to handle diverse data types, including text, images, and tables, the method adeptly manages the nuances of differing page layouts and report styles across industries. This research marks a substantial contribution to the fields of industrial ecology and corporate sustainability assessment, paving the way for the application of advanced NLP technologies and large language models in the analysis of corporate governance and sustainability. Our code is available at https://github.com/linancn/TianGong-AI-Unstructure.git.
Abstract:Large language models (LLMs) have achieved impressive linguistic capabilities. However, a key limitation persists in their lack of human-like memory faculties. LLMs exhibit constrained memory retention across sequential interactions, hindering complex reasoning. This paper explores the potential of applying cognitive psychology's working memory frameworks, to enhance LLM architecture. The limitations of traditional LLM memory designs are analyzed, including their isolation of distinct dialog episodes and lack of persistent memory links. To address this, an innovative model is proposed incorporating a centralized Working Memory Hub and Episodic Buffer access to retain memories across episodes. This architecture aims to provide greater continuity for nuanced contextual reasoning during intricate tasks and collaborative scenarios. While promising, further research is required into optimizing episodic memory encoding, storage, prioritization, retrieval, and security. Overall, this paper provides a strategic blueprint for developing LLM agents with more sophisticated, human-like memory capabilities, highlighting memory mechanisms as a vital frontier in artificial general intelligence.
Abstract:In the realm of expressive Text-to-Speech (TTS), explicit prosodic boundaries significantly advance the naturalness and controllability of synthesized speech. While human prosody annotation contributes a lot to the performance, it is a labor-intensive and time-consuming process, often resulting in inconsistent outcomes. Despite the availability of extensive supervised data, the current benchmark model still faces performance setbacks. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. Specifically, in the first stage, we propose contrastive text-speech pretraining of Speech-Silence and Word-Punctuation (SSWP) pairs. The pretraining procedure hammers at enhancing the prosodic space extracted from joint text-speech space. In the second stage, we build a multi-modal prosody annotator, which consists of pretrained encoders, a straightforward yet effective text-speech feature fusion scheme, and a sequence classifier. Extensive experiments conclusively demonstrate that our proposed method excels at automatically generating prosody annotation and achieves state-of-the-art (SOTA) performance. Furthermore, our novel model has exhibited remarkable resilience when tested with varying amounts of data.
Abstract:By exploiting the degree of freedom on the altitude, unmanned aerial vehicle (UAV) communication can provide ubiquitous communication for future wireless networks. In the case of concurrent transmission of multiple UAVs, the directional beamforming formed by multiple antennas is an effective way to reduce co-channel interference. However, factors such as airflow disturbance or estimation error for UAV communications can cause the occurrence of beam misalignment. In this paper, we investigate the system performance of a multi-tier UAV communication network with the consideration of unstable beam alignment. In particular, we propose a tractable random model to capture the impacts of beam misalignment in the 3D space. Based on this, by utilizing stochastic geometry, an analytical framework for obtaining the outage probability in the downlink of a multi-tier UAV communication network for the closest distance association scheme and the maximum average power association scheme is established. The accuracy of the analysis is verified by Monte-Carlo simulations. The results indicate that in the presence of random beam misalignment, the optimal number of UAV antennas needs to be adjusted to be relatively larger when the density of UAVs increases or the altitude of UAVs becomes higher.
Abstract:Over-the-air computation (AirComp), as a data aggregation method that can improve network efficiency by exploiting the superposition characteristics of wireless channels, has received much attention recently. Meanwhile, the orthogonal time frequency space (OTFS) modulation can provide a strong Doppler resilience and facilitates reliable transmission for high-mobility communications. Hence, in this work, we investigate an OTFS-based AirComp system in the presence of time-frequency dual-selective channels. In particular, we commence from the development of a novel transmission framework for the considered system, where the pilot signal is sent together with data and the channel estimation is implemented according to the echo from the access point to the sensor, thereby reducing the overhead of channel state information (CSI) feedback. Hereafter, based on the CSI estimated from the previous frame, a robust precoding matrix aiming at minimizing mean square error in the current frame is designed, which takes into account the estimation error from the receiver noise and the outdated CSI. The simulation results demonstrate the effectiveness of the proposed robust precoding scheme by comparing it with the non-robust precoding. The performance gain is more obvious in high signal-to-noise ratio in case of large channel estimation errors.
Abstract:Signal classification problems arise in a wide variety of applications, and their demand is only expected to grow. In this paper, we focus on the wireless sensor network signal classification setting, where each sensor forwards quantized signals to a fusion center to be classified. Our primary goal is to train a decision function and quantizers across the sensors to maximize the classification performance in an online manner. Moreover, we are interested in sparse sensor selection using a marginalized weighted kernel approach to improve network resource efficiency by disabling less reliable sensors with minimal effect on classification performance.To achieve our goals, we develop a multi-sensor online kernel scalar quantization (MSOKSQ) learning strategy that operates on the sensor outputs at the fusion center. Our theoretical analysis reveals how the proposed algorithm affects the quantizers across the sensors. Additionally, we provide a convergence analysis of our online learning approach by studying its relationship to batch learning. We conduct numerical studies under different classification and sensor network settings which demonstrate the accuracy gains from optimizing different components of MSOKSQ and robustness to reduction in the number of sensors selected.