Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Changhong Wang

LTCI

Modular XL-Array-Enabled 3-D Localization based on Hybrid Spherical-Planar Wave Model in Terahertz Systems

Apr 18, 2025

Yang Zhang, Ruidong Li, Cunhua Pan, Hong Ren, Tuo Wu, Changhong Wang

Abstract:This work considers the three-dimensional (3-D) positioning problem in a Terahertz (THz) system enabled by a modular extra-large (XL) array with sub-connected architecture. Our purpose is to estimate the Cartesian Coordinates of multiple user equipments (UEs) with the received signal of the RF chains while considering the spatial non-stationarity (SNS). We apply the hybrid spherical-planar wave model (HSPWM) as the channel model owing to the structual feature of the modular array, and propose a 3-D localization algorithm with relatively high accuracy and low complexity. Specifically, we first distinguish the visible sub-arrays (SAs) located in the VR and estimate the angles-of-arrival (AoAs) from each UE to typical visible SAs with the largest receive power via compressed sensing (CS) method. In addition, we apply the weighted least square (WLS) method to obtain a coarse 3-D position estimation of each UE according to the AoA estimations. Then, we estimate the AoAs of the other SAs with a reduced dictionary (RD)-CS-based method for lower computational complexity, and utilize all the efficient AoA estimations to derive a fine position estimation. Simulation results indicate that the proposed positioning framework based on modular XL-array can achieve satisfactory accuracy with evident reduction in complexity. Furthermore, the deployment of SAs and the allocation of antenna elements need to be specially designed for better positioning performance.

* 13 pages, 11 figures

Via

Access Paper or Ask Questions

Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation

Apr 07, 2025

Manvi Agarwal, Changhong Wang, Gael Richard

Abstract:While music remains a challenging domain for generative models like Transformers, a two-pronged approach has recently proved successful: inserting musically-relevant structural information into the positional encoding (PE) module and using kernel approximation techniques based on Random Fourier Features (RFF) to lower the computational cost from quadratic to linear. Yet, it is not clear how such RFF-based efficient PEs compare with those based on rotation matrices, such as Rotary Positional Encoding (RoPE). In this paper, we present a unified framework based on kernel methods to analyze both families of efficient PEs. We use this framework to develop a novel PE method called RoPEPool, capable of extracting causal relationships from temporal sequences. Using RFF-based PEs and rotation-based PEs, we demonstrate how seemingly disparate PEs can be jointly studied by considering the content-context interactions they induce. For empirical validation, we use a symbolic music generation task, namely, melody harmonization. We show that RoPEPool, combined with highly-informative structural priors, outperforms all methods.

Via

Access Paper or Ask Questions

Cross-Lingual Consistency: A Novel Inference Framework for Advancing Reasoning in Large Language Models

Apr 02, 2025

Zhiwei Yu, Tuo Li, Changhong Wang, Hui Chen, Lang Zhou

Abstract:Chain-of-thought (CoT) has emerged as a critical mechanism for enhancing reasoning capabilities in large language models (LLMs), with self-consistency demonstrating notable promise in boosting performance. However, inherent linguistic biases in multilingual training corpora frequently cause semantic drift and logical inconsistencies, especially in sub-10B parameter LLMs handling complex inference tasks. To overcome these constraints, we propose the Cross-Lingual Consistency (CLC) framework, an innovative inference paradigm that integrates multilingual reasoning paths through majority voting to elevate LLMs' reasoning capabilities. Empirical evaluations on the CMATH dataset reveal CLC's superiority over the conventional self-consistency method, delivering 9.5%, 6.5%, and 6.0% absolute accuracy gains for DeepSeek-Math-7B-Instruct, Qwen2.5-Math-7B-Instruct, and Gemma2-9B-Instruct respectively. Expanding CLC's linguistic scope to 11 diverse languages implies two synergistic benefits: 1) neutralizing linguistic biases in multilingual training corpora through multilingual ensemble voting, 2) escaping monolingual reasoning traps by exploring the broader multilingual solution space. This dual benefits empirically enables more globally optimal reasoning paths compared to monolingual self-consistency baselines, as evidenced by the 4.1%-18.5% accuracy gains using Gemma2-9B-Instruct on the MGSM dataset.

Via

Access Paper or Ask Questions

Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects

Jan 27, 2025

Victor Deng, Changhong Wang, Gael Richard, Brian McFee

Abstract:In recent years, foundation models have significantly advanced data-driven systems across various domains. Yet, their underlying properties, especially when functioning as feature extractors, remain under-explored. In this paper, we investigate the sensitivity to audio effects of audio embeddings extracted from widely-used foundation models, including OpenL3, PANNs, and CLAP. We focus on audio effects as the source of sensitivity due to their prevalent presence in large audio datasets. By applying parameterized audio effects (gain, low-pass filtering, reverberation, and bitcrushing), we analyze the correlation between the deformation trajectories and the effect strength in the embedding space. We propose to quantify the dimensionality and linearizability of the deformation trajectories induced by audio effects using canonical correlation analysis. We find that there exists a direction along which the embeddings move monotonically as the audio effect strength increases, but that the subspace containing the displacements is generally high-dimensional. This shows that pre-trained audio embeddings do not globally linearize the effects. Our empirical results on instrument classification downstream tasks confirm that projecting out the estimated deformation directions cannot generally improve the robustness of pre-trained embeddings to audio effects.

* IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr 2025, Hyderabad, India

Via

Access Paper or Ask Questions

Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning

May 12, 2024

Changhong Wang, Xudong Yu, Chenjia Bai, Qiaosheng Zhang, Zhen Wang

Abstract:In Reinforcement Learning (RL), training a policy from scratch with online experiences can be inefficient because of the difficulties in exploration. Recently, offline RL provides a promising solution by giving an initialized offline policy, which can be refined through online interactions. However, existing approaches primarily perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation. In real-world applications, it is common that we only have an offline dataset from a specific task while aiming for fast online-adaptation for several tasks. To address this problem, our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning. We demonstrate that the conventional paradigm using successor features cannot effectively utilize offline data and improve the performance for the new task by online fine-tuning. To mitigate this, we introduce a novel methodology that leverages offline data to acquire an ensemble of successor representations and subsequently constructs ensemble Q functions. This approach enables robust representation learning from datasets with different coverage and facilitates fast adaption of Q functions towards new tasks during the online fine-tuning phase. Extensive empirical evaluations provide compelling evidence showcasing the superior performance of our method in generalizing to diverse or even unseen tasks.

* Accepted by Science China Information Sciences

Via

Access Paper or Ask Questions

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning

Apr 09, 2024

Xudong Yu, Chenjia Bai, Hongyi Guo, Changhong Wang, Zhen Wang

Abstract:Offline Reinforcement Learning (RL) faces distributional shift and unreliable value estimation, especially for out-of-distribution (OOD) actions. To address this, existing uncertainty-based methods penalize the value function with uncertainty quantification and demand numerous ensemble networks, posing computational challenges and suboptimal outcomes. In this paper, we introduce a novel strategy employing diverse randomized value functions to estimate the posterior distribution of $Q$-values. It provides robust uncertainty quantification and estimates lower confidence bounds (LCB) of $Q$-values. By applying moderate value penalties for OOD actions, our method fosters a provably pessimistic approach. We also emphasize on diversity within randomized value functions and enhance efficiency by introducing a diversity regularization method, reducing the requisite number of networks. These modules lead to reliable value estimation and efficient policy learning from offline data. Theoretical analysis shows that our method recovers the provably efficient LCB-penalty under linear MDP assumptions. Extensive empirical results also demonstrate that our proposed method significantly outperforms baseline methods in terms of performance and parametric efficiency.

Via

Access Paper or Ask Questions

Regularized Conditional Diffusion Model for Multi-Task Preference Alignment

Apr 07, 2024

Xudong Yu, Chenjia Bai, Haoran He, Changhong Wang, Xuelong Li

Abstract:Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks. Previous methods formulate it as a conditional generation process, utilizing return-conditioned diffusion models to directly model trajectory distributions. Nevertheless, the return-conditioned paradigm relies on pre-defined reward functions, facing challenges when applied in multi-task settings characterized by varying reward functions (versatility) and showing limited controllability concerning human preferences (alignment). In this work, we adopt multi-task preferences as a unified condition for both single- and multi-task decision-making, and propose preference representations aligned with preference labels. The learned representations are used to guide the conditional generation process of diffusion models, and we introduce an auxiliary objective to maximize the mutual information between representations and corresponding generated trajectories, improving alignment between trajectories and preferences. Extensive experiments in D4RL and Meta-World demonstrate that our method presents favorable performance in single- and multi-task scenarios, and exhibits superior alignment with preferences.

Via

Access Paper or Ask Questions

Structure-informed Positional Encoding for Music Generation

Feb 28, 2024

Manvi Agarwal, Changhong Wang, Gaël Richard

Figure 1 for Structure-informed Positional Encoding for Music Generation

Figure 2 for Structure-informed Positional Encoding for Music Generation

Figure 3 for Structure-informed Positional Encoding for Music Generation

Abstract:Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary positional information. We comprehensively test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation. As a comparison, we choose multiple baselines from the literature and demonstrate the merits of our methods using several musically-motivated evaluation metrics. In particular, our methods improve the melodic and structural consistency of the generated pieces.

* IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024, Seoul, South Korea

Via

Access Paper or Ask Questions

Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

Jul 20, 2023

Changhong Wang, Gaël Richard, Brian McFee

Abstract:Deep neural network models have become the dominant approach to a large variety of tasks within music information retrieval (MIR). These models generally require large amounts of (annotated) training data to achieve high accuracy. Because not all applications in MIR have sufficient quantities of training data, it is becoming increasingly common to transfer models across domains. This approach allows representations derived for one task to be applied to another, and can result in high accuracy with less stringent training data requirements for the downstream task. However, the properties of pre-trained audio embeddings are not fully understood. Specifically, and unlike traditionally engineered features, the representations extracted from pre-trained deep networks may embed and propagate biases from the model's training regime. This work investigates the phenomenon of bias propagation in the context of pre-trained audio representations for the task of instrument recognition. We first demonstrate that three different pre-trained representations (VGGish, OpenL3, and YAMNet) exhibit comparable performance when constrained to a single dataset, but differ in their ability to generalize across datasets (OpenMIC and IRMAS). We then investigate dataset identity and genre distribution as potential sources of bias. Finally, we propose and evaluate post-processing countermeasures to mitigate the effects of bias, and improve generalization across datasets.

* 7 pages, 3 figures, accepted to the conference of the International Society for Music Information Retrieval (ISMIR 2023)

Via

Access Paper or Ask Questions

Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

Jan 24, 2023

Cyrus Vahidi, Han Han, Changhong Wang, Mathieu Lagrange, György Fazekas, Vincent Lostanlen

Figure 1 for Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

Figure 2 for Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

Figure 3 for Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

Figure 4 for Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

Abstract:Computer musicians refer to mesostructures as the intermediate levels of articulation between the microstructure of waveshapes and the macrostructure of musical forms. Examples of mesostructures include melody, arpeggios, syncopation, polyphonic grouping, and textural contrast. Despite their central role in musical expression, they have received limited attention in deep learning. Currently, autoencoders and neural audio synthesizers are only trained and evaluated at the scale of microstructure: i.e., local amplitude variations up to 100 milliseconds or so. In this paper, we formulate and address the problem of mesostructural audio modeling via a composition of a differentiable arpeggiator and time-frequency scattering. We empirically demonstrate that time--frequency scattering serves as a differentiable model of similarity between synthesis parameters that govern mesostructure. By exposing the sensitivity of short-time spectral distances to time alignment, we motivate the need for a time-invariant and multiscale differentiable time--frequency model of similarity at the level of both local spectra and spectrotemporal modulations.

Via

Access Paper or Ask Questions