Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiale Han

DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue

Apr 20, 2025

Xiang Li, Duyi Pan, Hongru Xiao, Jiale Han, Jing Tang, Jiabao Ma, Wei Wang, Bo Cheng

Abstract:Speech synthesis is crucial for human-computer interaction, enabling natural and intuitive communication. However, existing datasets involve high construction costs due to manual annotation and suffer from limited character diversity, contextual scenarios, and emotional expressiveness. To address these issues, we propose DialogueAgents, a novel hybrid agent-based speech synthesis framework, which integrates three specialized agents -- a script writer, a speech synthesizer, and a dialogue critic -- to collaboratively generate dialogues. Grounded in a diverse character pool, the framework iteratively refines dialogue scripts and synthesizes speech based on speech review, boosting emotional expressiveness and paralinguistic features of the synthesized dialogues. Using DialogueAgent, we contribute MultiTalk, a bilingual, multi-party, multi-turn speech dialogue dataset covering diverse topics. Extensive experiments demonstrate the effectiveness of our framework and the high quality of the MultiTalk dataset. We release the dataset and code https://github.com/uirlx/DialogueAgents to facilitate future research on advanced speech synthesis models and customized data generation.

* Accepted by ICME 2025. Dataset and code are publicly available: [https://github.com/uirlx/DialogueAgents](https://github.com/uirlx/DialogueAgents)

Via

Access Paper or Ask Questions

CEFW: A Comprehensive Evaluation Framework for Watermark in Large Language Models

Mar 24, 2025

Shuhao Zhang, Bo Cheng, Jiale Han, Yuli Chen, Zhixuan Wu, Changbao Li, Pingli Gu

Abstract:Text watermarking provides an effective solution for identifying synthetic text generated by large language models. However, existing techniques often focus on satisfying specific criteria while ignoring other key aspects, lacking a unified evaluation. To fill this gap, we propose the Comprehensive Evaluation Framework for Watermark (CEFW), a unified framework that comprehensively evaluates watermarking methods across five key dimensions: ease of detection, fidelity of text quality, minimal embedding cost, robustness to adversarial attacks, and imperceptibility to prevent imitation or forgery. By assessing watermarks according to all these key criteria, CEFW offers a thorough evaluation of their practicality and effectiveness. Moreover, we introduce a simple and effective watermarking method called Balanced Watermark (BW), which guarantees robustness and imperceptibility through balancing the way watermark information is added. Extensive experiments show that BW outperforms existing methods in overall performance across all evaluation dimensions. We release our code to the community for future research. https://github.com/DrankXs/BalancedWatermark.

Via

Access Paper or Ask Questions

An Immediate Update Strategy of Multi-State Constraint Kalman Filter

Nov 04, 2024

Qingchao Zhang, Wei Ouyang, Jiale Han, Qi Cai, Maoran Zhu, Yuanxin Wu

Figure 1 for An Immediate Update Strategy of Multi-State Constraint Kalman Filter

Figure 2 for An Immediate Update Strategy of Multi-State Constraint Kalman Filter

Figure 3 for An Immediate Update Strategy of Multi-State Constraint Kalman Filter

Figure 4 for An Immediate Update Strategy of Multi-State Constraint Kalman Filter

Abstract:The lightweight Multi-state Constraint Kalman Filter (MSCKF) has been well-known for its high efficiency, in which the delayed update has been usually adopted since its proposal. This work investigates the immediate update strategy of MSCKF based on timely reconstructed 3D feature points and measurement constraints. The differences between the delayed update and the immediate update are theoretically analyzed in detail. It is found that the immediate update helps construct more observation constraints and employ more filtering updates than the delayed update, which improves the linearization point of the measurement model and therefore enhances the estimation accuracy. Numerical simulations and experiments show that the immediate update strategy significantly enhances MSCKF even with a small amount of feature observations.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Conformal Online Auction Design

May 11, 2024

Jiale Han, Xiaowu Dai

Abstract:This paper proposes the conformal online auction design (COAD), a novel mechanism for maximizing revenue in online auctions by quantifying the uncertainty in bidders' values without relying on assumptions about value distributions. COAD incorporates both the bidder and item features and leverages historical data to provide an incentive-compatible mechanism for online auctions. Unlike traditional methods for online auctions, COAD employs a distribution-free, prediction interval-based approach using conformal prediction techniques. This novel approach ensures that the expected revenue from our mechanism can achieve at least a constant fraction of the revenue generated by the optimal mechanism. Additionally, COAD admits the use of a broad array of modern machine-learning methods, including random forests, kernel methods, and deep neural nets, for predicting bidders' values. It ensures revenue performance under any finite sample of historical data. Moreover, COAD introduces bidder-specific reserve prices based on the lower confidence bounds of bidders' valuations, which is different from the uniform reserve prices commonly used in the literature. We validate our theoretical predictions through extensive simulations and a real-data application. All code for using COAD and reproducing results is made available on GitHub.

Via

Access Paper or Ask Questions

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

Mar 31, 2024

Xiang Li, Fan Bu, Ambuj Mehrish, Yingting Li, Jiale Han, Bo Cheng, Soujanya Poria

Abstract:Neural Text-to-Speech (TTS) systems find broad applications in voice assistants, e-learning, and audiobook creation. The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis. Yet, the efficiency of multi-step sampling in Diffusion Models presents challenges. Efforts have been made to integrate GANs with DMs, speeding up inference by approximating denoising distributions, but this introduces issues with model convergence due to adversarial training. To overcome this, we introduce CM-TTS, a novel architecture grounded in consistency models (CMs). Drawing inspiration from continuous-time diffusion models, CM-TTS achieves top-quality speech synthesis in fewer steps without adversarial training or pre-trained model dependencies. We further design weighted samplers to incorporate different sampling positions into model training with dynamic probabilities, ensuring unbiased learning throughout the entire training process. We present a real-time mel-spectrogram generation consistency model, validated through comprehensive evaluations. Experimental results underscore CM-TTS's superiority over existing single-step speech synthesis systems, representing a significant advancement in the field.

* Accepted by Findings of NAACL 2024. Code is available at https://github.com/XiangLi2022/CM-TTS

Via

Access Paper or Ask Questions

Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Feb 24, 2024

Shengkun Ma, Jiale Han, Yi Liang, Bo Cheng

Figure 1 for Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Figure 2 for Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Figure 3 for Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Figure 4 for Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Abstract:Continual Few-shot Relation Extraction (CFRE) is a practical problem that requires the model to continuously learn novel relations while avoiding forgetting old ones with few labeled training data. The primary challenges are catastrophic forgetting and overfitting. This paper harnesses prompt learning to explore the implicit capabilities of pre-trained language models to address the above two challenges, thereby making language models better continual few-shot relation extractors. Specifically, we propose a Contrastive Prompt Learning framework, which designs prompt representation to acquire more generalized knowledge that can be easily adapted to old and new categories, and margin-based contrastive learning to focus more on hard samples, therefore alleviating catastrophic forgetting and overfitting issues. To further remedy overfitting in low-resource scenarios, we introduce an effective memory augmentation strategy that employs well-crafted prompts to guide ChatGPT in generating diverse samples. Extensive experiments demonstrate that our method outperforms state-of-the-art methods by a large margin and significantly mitigates catastrophic forgetting and overfitting in low-resource scenarios.

* Accepted as COLING2024

Via

Access Paper or Ask Questions

Robust multi-item auction design using statistical learning: Overcoming uncertainty in bidders' types distributions

Feb 10, 2023

Jiale Han, Xiaowu Dai

Abstract:This paper presents a novel mechanism design for multi-item auction settings with uncertain bidders' type distributions. Our proposed approach utilizes nonparametric density estimation to accurately estimate bidders' types from historical bids, and is built upon the Vickrey-Clarke-Groves (VCG) mechanism, ensuring satisfaction of Bayesian incentive compatibility (BIC) and $\delta$-individual rationality (IR). To further enhance the efficiency of our mechanism, we introduce two novel strategies for query reduction: a filtering method that screens potential winners' value regions within the confidence intervals generated by our estimated distribution, and a classification strategy that designates the lower bound of an interval as the estimated type when the length is below a threshold value. Simulation experiments conducted on both small-scale and large-scale data demonstrate that our mechanism consistently outperforms existing methods in terms of revenue maximization and query reduction, particularly in large-scale scenarios. This makes our proposed mechanism a highly desirable and effective option for sellers in the realm of multi-item auctions.

Via

Access Paper or Ask Questions

Exploring Task Difficulty for Few-Shot Relation Extraction

Sep 28, 2021

Jiale Han, Bo Cheng, Wei Lu

Figure 1 for Exploring Task Difficulty for Few-Shot Relation Extraction

Figure 2 for Exploring Task Difficulty for Few-Shot Relation Extraction

Figure 3 for Exploring Task Difficulty for Few-Shot Relation Extraction

Figure 4 for Exploring Task Difficulty for Few-Shot Relation Extraction

Abstract:Few-shot relation extraction (FSRE) focuses on recognizing novel relations by learning with merely a handful of annotated instances. Meta-learning has been widely adopted for such a task, which trains on randomly generated few-shot tasks to learn generic data representations. Despite impressive results achieved, existing models still perform suboptimally when handling hard FSRE tasks, where the relations are fine-grained and similar to each other. We argue this is largely because existing models do not distinguish hard tasks from easy ones in the learning process. In this paper, we introduce a novel approach based on contrastive learning that learns better representations by exploiting relation label information. We further design a method that allows the model to adaptively learn how to focus on hard tasks. Experiments on two standard datasets demonstrate the effectiveness of our method.

Via

Access Paper or Ask Questions

Integrating Subgraph-aware Relation and DirectionReasoning for Question Answering

Apr 01, 2021

Xu Wang, Shuai Zhao, Bo Cheng, Jiale Han, Yingting Li, Hao Yang, Ivan Sekulic, Guoshun Nan

Figure 1 for Integrating Subgraph-aware Relation and DirectionReasoning for Question Answering

Figure 2 for Integrating Subgraph-aware Relation and DirectionReasoning for Question Answering

Figure 3 for Integrating Subgraph-aware Relation and DirectionReasoning for Question Answering

Figure 4 for Integrating Subgraph-aware Relation and DirectionReasoning for Question Answering

Abstract:Question Answering (QA) models over Knowledge Bases (KBs) are capable of providing more precise answers by utilizing relation information among entities. Although effective, most of these models solely rely on fixed relation representations to obtain answers for different question-related KB subgraphs. Hence, the rich structured information of these subgraphs may be overlooked by the relation representation vectors. Meanwhile, the direction information of reasoning, which has been proven effective for the answer prediction on graphs, has not been fully explored in existing work. To address these challenges, we propose a novel neural model, Relation-updated Direction-guided Answer Selector (RDAS), which converts relations in each subgraph to additional nodes to learn structure information. Additionally, we utilize direction information to enhance the reasoning ability. Experimental results show that our model yields substantial improvements on two widely used datasets.

* Accepted by ICASSP 2021

Via

Access Paper or Ask Questions