Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yueguo Chen

MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Jan 07, 2025

Haojie Wei, Jun Yuan, Rui Zhang, Quanyu Dai, Yueguo Chen

Figure 1 for MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Figure 2 for MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Figure 3 for MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Figure 4 for MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Abstract:Music source separation and pitch estimation are two vital tasks in music information retrieval. Typically, the input of pitch estimation is obtained from the output of music source separation. Therefore, existing methods have tried to perform these two tasks simultaneously, so as to leverage the mutually beneficial relationship between both tasks. However, these methods still face two critical challenges that limit the improvement of both tasks: the lack of labeled data and joint learning optimization. To address these challenges, we propose a Model-Agnostic Joint Learning (MAJL) framework for both tasks. MAJL is a generic framework and can use variant models for each task. It includes a two-stage training method and a dynamic weighting method named Dynamic Weights on Hard Samples (DWHS), which addresses the lack of labeled data and joint learning optimization, respectively. Experimental results on public music datasets show that MAJL outperforms state-of-the-art methods on both tasks, with significant improvements of 0.92 in Signal-to-Distortion Ratio (SDR) for music source separation and 2.71% in Raw Pitch Accuracy (RPA) for pitch estimation. Furthermore, comprehensive studies not only validate the effectiveness of each component of MAJL, but also indicate the great generality of MAJL in adapting to different model architectures.

Via

Access Paper or Ask Questions

YuLan: An Open-source Large Language Model

Jun 28, 2024

Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang(+28 more)

Figure 1 for YuLan: An Open-source Large Language Model

Figure 2 for YuLan: An Open-source Large Language Model

Figure 3 for YuLan: An Open-source Large Language Model

Figure 4 for YuLan: An Open-source Large Language Model

Abstract:Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We design a three-stage pre-training method to enhance YuLan's overall capabilities. Subsequent phases of training incorporate instruction-tuning and human alignment, employing a substantial volume of high-quality synthesized data. To facilitate the learning of complex and long-tail knowledge, we devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner. YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks. This paper outlines a comprehensive technical roadmap for developing LLMs from scratch. Our model and codes are available at https://github.com/RUC-GSAI/YuLan-Chat.

Via

Access Paper or Ask Questions

Large Language Model for Table Processing: A Survey

Feb 04, 2024

Weizheng Lu, Jiaming Zhang, Jing Zhang, Yueguo Chen

Abstract:Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet calculations, and generating reports from web tables. Automating these table-centric tasks with Large Language Models (LLMs) offers significant public benefits, garnering interest from academia and industry. This survey provides an extensive overview of table tasks, encompassing not only the traditional areas like table question answering (Table QA) and fact verification, but also newly emphasized aspects such as table manipulation and advanced table data analysis. Additionally, it goes beyond the early strategies of pre-training and fine-tuning small language models, to include recent paradigms in LLM usage. The focus here is particularly on instruction-tuning, prompting, and agent-based approaches within the realm of LLMs. Finally, we highlight several challenges, ranging from private deployment and efficient inference to the development of extensive benchmarks for table manipulation and advanced data analysis.

Via

Access Paper or Ask Questions

DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation

Jan 08, 2024

Haojie Wei, Xueke Cao, Wenbo Xu, Tangpeng Dan, Yueguo Chen

Figure 1 for DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation

Figure 2 for DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation

Figure 3 for DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation

Figure 4 for DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation

Abstract:Singing voice separation and vocal pitch estimation are pivotal tasks in music information retrieval. Existing methods for simultaneous extraction of clean vocals and vocal pitches can be classified into two categories: pipeline methods and naive joint learning methods. However, the efficacy of these methods is limited by the following problems: On the one hand, pipeline methods train models for each task independently, resulting a mismatch between the data distributions at the training and testing time. On the other hand, naive joint learning methods simply add the losses of both tasks, possibly leading to a misalignment between the distinct objectives of each task. To solve these problems, we propose a Deep Joint Cascade Model (DJCM) for singing voice separation and vocal pitch estimation. DJCM employs a novel joint cascade model structure to concurrently train both tasks. Moreover, task-specific weights are used to align different objectives of both tasks. Experimental results show that DJCM achieves state-of-the-art performance on both tasks, with great improvements of 0.45 in terms of Signal-to-Distortion Ratio (SDR) for singing voice separation and 2.86% in terms of Overall Accuracy (OA) for vocal pitch estimation. Furthermore, extensive ablation studies validate the effectiveness of each design of our proposed model. The code of DJCM is available at https://github.com/Dream-High/DJCM .

* This paper has been accepted by ICASSP 2024

Via

Access Paper or Ask Questions

REAL: A Representative Error-Driven Approach for Active Learning

Jul 06, 2023

Cheng Chen, Yong Wang, Lizi Liao, Yueguo Chen, Xiaoyong Du

Abstract:Given a limited labeling budget, active learning (AL) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, AL typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose $REAL$, a novel approach to select data instances with $\underline{R}$epresentative $\underline{E}$rrors for $\underline{A}$ctive $\underline{L}$earning. It identifies minority predictions as \emph{pseudo errors} within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that $REAL$ consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that $REAL$ selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real.

* Accepted by ECML/PKDD 2023

Via

Access Paper or Ask Questions

RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music

Jun 28, 2023

Haojie Wei, Xueke Cao, Tangpeng Dan, Yueguo Chen

Abstract:Vocal pitch is an important high-level feature in music audio processing. However, extracting vocal pitch in polyphonic music is more challenging due to the presence of accompaniment. To eliminate the influence of the accompaniment, most previous methods adopt music source separation models to obtain clean vocals from polyphonic music before predicting vocal pitches. As a result, the performance of vocal pitch estimation is affected by the music source separation models. To address this issue and directly extract vocal pitches from polyphonic music, we propose a robust model named RMVPE. This model can extract effective hidden features and accurately predict vocal pitches from polyphonic music. The experimental results demonstrate the superiority of RMVPE in terms of raw pitch accuracy (RPA) and raw chroma accuracy (RCA). Additionally, experiments conducted with different types of noise show that RMVPE is robust across all signal-to-noise ratio (SNR) levels. The code of RMVPE is available at https://github.com/Dream-High/RMVPE.

* This paper has been accepted by INTERSPEECH 2023

Via

Access Paper or Ask Questions

JEPOO: Highly Accurate Joint Estimation of Pitch, Onset and Offset for Music Information Retrieval

Jun 02, 2023

Haojie Wei, Jun Yuan, Rui Zhang, Yueguo Chen, Gang Wang

Abstract:Melody extraction is a core task in music information retrieval, and the estimation of pitch, onset and offset are key sub-tasks in melody extraction. Existing methods have limited accuracy, and work for only one type of data, either single-pitch or multipitch. In this paper, we propose a highly accurate method for joint estimation of pitch, onset and offset, named JEPOO. We address the challenges of joint learning optimization and handling both single-pitch and multi-pitch data through novel model design and a new optimization technique named Pareto modulated loss with loss weight regularization. This is the first method that can accurately handle both single-pitch and multi-pitch music data, and even a mix of them. A comprehensive experimental study on a wide range of real datasets shows that JEPOO outperforms state-ofthe-art methods by up to 10.6%, 8.3% and 10.3% for the prediction of Pitch, Onset and Offset, respectively, and JEPOO is robust for various types of data and instruments. The ablation study shows the effectiveness of each component of JEPOO.

* This paper has been accepted by IJCAI 2023; 11 pages, 6 figures

Via

Access Paper or Ask Questions

CCSL: A Causal Structure Learning Method from Multiple Unknown Environments

Nov 18, 2021

Wei Chen, Yunjin Wu, Ruichu Cai, Yueguo Chen, Zhifeng Hao

Figure 1 for CCSL: A Causal Structure Learning Method from Multiple Unknown Environments

Figure 2 for CCSL: A Causal Structure Learning Method from Multiple Unknown Environments

Figure 3 for CCSL: A Causal Structure Learning Method from Multiple Unknown Environments

Figure 4 for CCSL: A Causal Structure Learning Method from Multiple Unknown Environments

Abstract:Most existing causal structure learning methods require data to be independent and identically distributed (i.i.d.), which often cannot be guaranteed when the data come from different environments. Some previous efforts try to tackle this problem in two independent stages, i.e., first discovering i.i.d. clusters from non-i.i.d. samples, then learning the causal structures from different groups. This straightforward solution ignores the intrinsic connections between the two stages, that is both the clustering stage and the learning stage should be guided by the same causal mechanism. Towards this end, we propose a unified Causal Cluster Structures Learning (named CCSL) method for causal discovery from non-i.i.d. data. This method simultaneously integrates the following two tasks: 1) clustering subjects with the same causal mechanism; 2) learning causal structures from the samples of subjects. Specifically, for the former, we provide a Causality-related Chinese Restaurant Process to cluster samples based on the similarity of the causal structure; for the latter, we introduce a variational-inference-based approach to learn the causal structures. Theoretical results provide identification of the causal model and the clustering model under the linear non-Gaussian assumption. Experimental results on both simulated and real-world data further validate the correctness and effectiveness of the proposed method.

Via

Access Paper or Ask Questions