Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuxuan He

TTMBA: Towards Text To Multiple Sources Binaural Audio Generation

Jul 22, 2025

Yuxuan He, Xiaoran Yang, Ningning Pan, Gongping Huang

Figure 1 for TTMBA: Towards Text To Multiple Sources Binaural Audio Generation

Figure 2 for TTMBA: Towards Text To Multiple Sources Binaural Audio Generation

Figure 3 for TTMBA: Towards Text To Multiple Sources Binaural Audio Generation

Figure 4 for TTMBA: Towards Text To Multiple Sources Binaural Audio Generation

Abstract:Most existing text-to-audio (TTA) generation methods produce mono outputs, neglecting essential spatial information for immersive auditory experiences. To address this issue, we propose a cascaded method for text-to-multisource binaural audio generation (TTMBA) with both temporal and spatial control. First, a pretrained large language model (LLM) segments the text into a structured format with time and spatial details for each sound event. Next, a pretrained mono audio generation network creates multiple mono audios with varying durations for each event. These mono audios are transformed into binaural audios using a binaural rendering neural network based on spatial data from the LLM. Finally, the binaural audios are arranged by their start times, resulting in multisource binaural audio. Experimental results demonstrate the superiority of the proposed method in terms of both audio generation quality and spatial perceptual accuracy.

* 5 pages,3 figures,2 tables

Via

Access Paper or Ask Questions

Towards the Three-Phase Dynamics of Generalization Power of a DNN

May 11, 2025

Yuxuan He, Junpeng Zhang, Hongyuan Zhang, Quanshi Zhang

Abstract:This paper proposes a new perspective for analyzing the generalization power of deep neural networks (DNNs), i.e., directly disentangling and analyzing the dynamics of generalizable and non-generalizable interaction encoded by a DNN through the training process. Specifically, this work builds upon the recent theoretical achievement in explainble AI, which proves that the detailed inference logic of DNNs can be can be strictly rewritten as a small number of AND-OR interaction patterns. Based on this, we propose an efficient method to quantify the generalization power of each interaction, and we discover a distinct three-phase dynamics of the generalization power of interactions during training. In particular, the early phase of training typically removes noisy and non-generalizable interactions and learns simple and generalizable ones. The second and the third phases tend to capture increasingly complex interactions that are harder to generalize. Experimental results verify that the learning of non-generalizable interactions is the the direct cause for the gap between the training and testing losses.

Via

Access Paper or Ask Questions

An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction

Nov 03, 2023

Junxian Zhou, Haiqin Yang, Ye Junpeng, Yuxuan He, Hao Mou

Figure 1 for An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction

Figure 2 for An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction

Figure 3 for An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction

Figure 4 for An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction

Abstract:Aspect sentiment quad prediction (ASQP) is a critical subtask of aspect-level sentiment analysis. Current ASQP datasets are characterized by their small size and low quadruple density, which hinders technical development. To expand capacity, we construct two large Chinese ASQP datasets crawled from multiple online platforms. The datasets hold several significant characteristics: larger size (each with 10,000+ samples) and rich aspect categories, more words per sentence, and higher density than existing ASQP datasets. Moreover, we are the first to evaluate the performance of Generative Pre-trained Transformer (GPT) series models on ASQP and exhibit potential issues. The experiments with state-of-the-art ASQP baselines underscore the need to explore additional techniques to address ASQP, as well as the importance of further investigation into methods to improve the performance of GPTs.

* 12 pages, 4 tables, 4 figures

Via

Access Paper or Ask Questions

A Unified One-Step Solution for Aspect Sentiment Quad Prediction

Jun 07, 2023

Junxian Zhou, Haiqin Yang, Yuxuan He, Hao Mou, Junbo Yang

Figure 1 for A Unified One-Step Solution for Aspect Sentiment Quad Prediction

Figure 2 for A Unified One-Step Solution for Aspect Sentiment Quad Prediction

Figure 3 for A Unified One-Step Solution for Aspect Sentiment Quad Prediction

Figure 4 for A Unified One-Step Solution for Aspect Sentiment Quad Prediction

Abstract:Aspect sentiment quad prediction (ASQP) is a challenging yet significant subtask in aspect-based sentiment analysis as it provides a complete aspect-level sentiment structure. However, existing ASQP datasets are usually small and low-density, hindering technical advancement. To expand the capacity, in this paper, we release two new datasets for ASQP, which contain the following characteristics: larger size, more words per sample, and higher density. With such datasets, we unveil the shortcomings of existing strong ASQP baselines and therefore propose a unified one-step solution for ASQP, namely One-ASQP, to detect the aspect categories and to identify the aspect-opinion-sentiment (AOS) triplets simultaneously. Our One-ASQP holds several unique advantages: (1) by separating ASQP into two subtasks and solving them independently and simultaneously, we can avoid error propagation in pipeline-based methods and overcome slow training and inference in generation-based methods; (2) by introducing sentiment-specific horns tagging schema in a token-pair-based two-dimensional matrix, we can exploit deeper interactions between sentiment elements and efficiently decode the AOS triplets; (3) we design ``[NULL]'' token can help us effectively identify the implicit aspects or opinions. Experiments on two benchmark datasets and our released two datasets demonstrate the advantages of our One-ASQP. The two new datasets are publicly released at \url{https://www.github.com/Datastory-CN/ASQP-Datasets}.

* 15 pages, 12 tables, 3 figures, ACL Findings

Via

Access Paper or Ask Questions

Model-Agnostic Meta-Learning for Natural Language Understanding Tasks in Finance

Mar 06, 2023

Bixing Yan, Shaoling Chen, Yuxuan He, Zhihan Li

Figure 1 for Model-Agnostic Meta-Learning for Natural Language Understanding Tasks in Finance

Figure 2 for Model-Agnostic Meta-Learning for Natural Language Understanding Tasks in Finance

Figure 3 for Model-Agnostic Meta-Learning for Natural Language Understanding Tasks in Finance

Figure 4 for Model-Agnostic Meta-Learning for Natural Language Understanding Tasks in Finance

Abstract:Natural language understanding(NLU) is challenging for finance due to the lack of annotated data and the specialized language in that domain. As a result, researchers have proposed to use pre-trained language model and multi-task learning to learn robust representations. However, aggressive fine-tuning often causes over-fitting and multi-task learning may favor tasks with significantly larger amounts data, etc. To address these problems, in this paper, we investigate model-agnostic meta-learning algorithm(MAML) in low-resource financial NLU tasks. Our contribution includes: 1. we explore the performance of MAML method with multiple types of tasks: GLUE datasets, SNLI, Sci-Tail and Financial PhraseBank; 2. we study the performance of MAML method with multiple single-type tasks: a real scenario stock price prediction problem with twitter text data. Our models achieve the state-of-the-art performance according to the experimental results, which demonstrate that our method can adapt fast and well to low-resource situations.

* 13 pages, 6 figures, 8 tables

Via

Access Paper or Ask Questions

Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

Mar 31, 2021

Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-yi Lee, Lei Xie

Figure 1 for Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

Figure 2 for Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

Figure 3 for Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

Figure 4 for Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

Abstract:Auto-KWS 2021 challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to a customized keyword spotting task. Compared with other keyword spotting tasks, Auto-KWS challenge has the following three characteristics: 1) The challenge focuses on the problem of customized keyword spotting, where the target device can only be awakened by an enrolled speaker with his specified keyword. The speaker can use any language and accent to define his keyword. 2) All dataset of the challenge is recorded in realistic environment. It is to simulate different user scenarios. 3) Auto-KWS is a "code competition", where participants need to submit AutoML solutions, then the platform automatically runs the enrollment and prediction steps with the submitted code.This challenge aims at promoting the development of a more personalized and flexible keyword spotting system. Two baseline systems are provided to all participants as references.

* 5 pages, 2 figures

Via

Access Paper or Ask Questions

FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions

Nov 28, 2019

Sebastian Schelter, Yuxuan He, Jatin Khilnani, Julia Stoyanovich

Figure 1 for FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions

Figure 2 for FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions

Figure 3 for FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions

Figure 4 for FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions

Abstract:The importance of incorporating ethics and legal compliance into machine-assisted decision-making is broadly recognized. Further, several lines of recent work have argued that critical opportunities for improving data quality and representativeness, controlling for bias, and allowing humans to oversee and impact computational processes are missed if we do not consider the lifecycle stages upstream from model training and deployment. Yet, very little has been done to date to provide system-level support to data scientists who wish to develop and deploy responsible machine learning methods. We aim to fill this gap and present FairPrep, a design and evaluation framework for fairness-enhancing interventions. FairPrep is based on a developer-centered design, and helps data scientists follow best practices in software engineering and machine learning. As part of our contribution, we identify shortcomings in existing empirical studies for analyzing fairness-enhancing interventions. We then show how FairPrep can be used to measure the impact of sound best practices, such as hyperparameter tuning and feature scaling. In particular, our results suggest that the high variability of the outcomes of fairness-enhancing interventions observed in previous studies is often an artifact of a lack of hyperparameter tuning. Further, we show that the choice of a data cleaning method can impact the effectiveness of fairness-enhancing interventions.

Via

Access Paper or Ask Questions