Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Baoxun Wang

RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward

May 15, 2025

Zongsheng Wang, Kaili Sun, Bowen Wu, Qun Yu, Ying Li, Baoxun Wang

Abstract:Role-playing conversational agents (RPCAs) face persistent challenges in maintaining role consistency. To address this, we propose RAIDEN-R1, a novel reinforcement learning framework that integrates Verifiable Role-Awareness Reward (VRAR). The method introduces both singular and multi-term mining strategies to generate quantifiable rewards by assessing role-specific keys. Additionally, we construct a high-quality, role-aware Chain-of-Thought dataset through multi-LLM collaboration, and implement experiments to enhance reasoning coherence. Experiments on the RAIDEN benchmark demonstrate RAIDEN-R1's superiority: our 14B-GRPO model achieves 88.04% and 88.65% accuracy on Script-Based Knowledge and Conversation Memory metrics, respectively, outperforming baseline models while maintaining robustness. Case analyses further reveal the model's enhanced ability to resolve conflicting contextual cues and sustain first-person narrative consistency. This work bridges the non-quantifiability gap in RPCA training and provides insights into role-aware reasoning patterns, advancing the development of RPCAs.

Via

Access Paper or Ask Questions

Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling

Apr 21, 2025

Zihao Feng, Xiaoxue Wang, Ziwei Bai, Donghang Su, Bowen Wu, Qun Yu, Baoxun Wang

Abstract:Intent detection, a critical component in task-oriented dialogue (TOD) systems, faces significant challenges in adapting to the rapid influx of integrable tools with complex interrelationships. Existing approaches, such as zero-shot reformulations and LLM-based dynamic recognition, struggle with performance degradation when encountering unseen intents, leading to erroneous task routing. To enhance the model's generalization performance on unseen tasks, we employ Reinforcement Learning (RL) combined with a Reward-based Curriculum Sampling (RCS) during Group Relative Policy Optimization (GRPO) training in intent detection tasks. Experiments demonstrate that RL-trained models substantially outperform supervised fine-tuning (SFT) baselines in generalization. Besides, the introduction of the RCS, significantly bolsters the effectiveness of RL in intent detection by focusing the model on challenging cases during training. Moreover, incorporating Chain-of-Thought (COT) processes in RL notably improves generalization in complex intent detection tasks, underscoring the importance of thought in challenging scenarios. This work advances the generalization of intent detection tasks, offering practical insights for deploying adaptable dialogue systems.

Via

Access Paper or Ask Questions

F5R-TTS: Improving Flow Matching based Text-to-Speech with Group Relative Policy Optimization

Apr 03, 2025

Xiaohui Sun, Ruitong Xiao, Jianye Mo, Bowen Wu, Qun Yu, Baoxun Wang

Abstract:We present F5R-TTS, a novel text-to-speech (TTS) system that integrates Gradient Reward Policy Optimization (GRPO) into a flow-matching based architecture. By reformulating the deterministic outputs of flow-matching TTS into probabilistic Gaussian distributions, our approach enables seamless integration of reinforcement learning algorithms. During pretraining, we train a probabilistically reformulated flow-matching based model which is derived from F5-TTS with an open-source dataset. In the subsequent reinforcement learning (RL) phase, we employ a GRPO-driven enhancement stage that leverages dual reward metrics: word error rate (WER) computed via automatic speech recognition and speaker similarity (SIM) assessed by verification models. Experimental results on zero-shot voice cloning demonstrate that F5R-TTS achieves significant improvements in both speech intelligibility (relatively 29.5\% WER reduction) and speaker similarity (relatively 4.6\% SIM score increase) compared to conventional flow-matching based TTS systems. Audio samples are available at https://frontierlabs.github.io/F5R.

Via

Access Paper or Ask Questions

Interpersonal Memory Matters: A New Task for Proactive Dialogue Utilizing Conversational History

Mar 07, 2025

Bowen Wu, Wenqing Wang, Haoran Li, Ying Li, Jingsong Yu, Baoxun Wang

Abstract:Proactive dialogue systems aim to empower chatbots with the capability of leading conversations towards specific targets, thereby enhancing user engagement and service autonomy. Existing systems typically target pre-defined keywords or entities, neglecting user attributes and preferences implicit in dialogue history, hindering the development of long-term user intimacy. To address these challenges, we take a radical step towards building a more human-like conversational agent by integrating proactive dialogue systems with long-term memory into a unified framework. Specifically, we define a novel task named Memory-aware Proactive Dialogue (MapDia). By decomposing the task, we then propose an automatic data construction method and create the first Chinese Memory-aware Proactive Dataset (ChMapData). Furthermore, we introduce a joint framework based on Retrieval Augmented Generation (RAG), featuring three modules: Topic Summarization, Topic Retrieval, and Proactive Topic-shifting Detection and Generation, designed to steer dialogues towards relevant historical topics at the right time. The effectiveness of our dataset and models is validated through both automatic and human evaluations. We release the open-source framework and dataset at https://github.com/FrontierLabs/MapDia.

Via

Access Paper or Ask Questions

Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation

Oct 14, 2024

Shun Qian, Bingquan Liu, Chengjie Sun, Zhen Xu, Baoxun Wang

Abstract:The projector plays a crucial role in multi-modal language models (MLLMs). The number of visual tokens it outputs affects the efficiency of the MLLM, while the quality of the visual tokens influences the visual understanding capabilities of the MLLM. Current explorations on the projector focus on reducing the number of visual tokens to improve efficiency, often overlooking the inherent spatial discrepancy between the serialized 2-dimensional visual token sequences and natural language token sequences. A Spatial-Aware Efficient Projector (SAEP) is proposed to address this issue. In detail, our SAEP method employs an modified separable depthwise convolution module on multi-layer visual features to enhance the spatial information of visual tokens. As a result, our SAEP method can not only largely reduce the number of visual tokens by 75\%, but also significantly improve the multimodal spatial understanding capability of MLLMs. Moreover, compared to existing projectors, our SAEP gets best performances on massive multimodal evaluation benchmarks, which denotes its effectiveness on bridging the modality gap.

* 10 pages, 3 figures

Via

Access Paper or Ask Questions

ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis

Jun 27, 2023

Yakun Yu, Mingjun Zhao, Shi-ang Qi, Feiran Sun, Baoxun Wang, Weidong Guo, Xiaoli Wang, Lei Yang, Di Niu

Figure 1 for ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis

Figure 2 for ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis

Figure 3 for ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis

Figure 4 for ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis

Abstract:Multimodal Sentiment Analysis leverages multimodal signals to detect the sentiment of a speaker. Previous approaches concentrate on performing multimodal fusion and representation learning based on general knowledge obtained from pretrained models, which neglects the effect of domain-specific knowledge. In this paper, we propose Contrastive Knowledge Injection (ConKI) for multimodal sentiment analysis, where specific-knowledge representations for each modality can be learned together with general knowledge representations via knowledge injection based on an adapter architecture. In addition, ConKI uses a hierarchical contrastive learning procedure performed between knowledge types within every single modality, across modalities within each sample, and across samples to facilitate the effective learning of the proposed representations, hence improving multimodal sentiment predictions. The experiments on three popular multimodal sentiment analysis benchmarks show that ConKI outperforms all prior methods on a variety of performance metrics.

* Accepted by ACL Findings 2023

Via

Access Paper or Ask Questions

Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation

Apr 07, 2020

Bowen Wu, Huan Zhang, Mengyuan Li, Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang

Figure 1 for Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation

Figure 2 for Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation

Figure 3 for Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation

Figure 4 for Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation

Abstract:Recently, BERT has become an essential ingredient of various NLP deep models due to its effectiveness and universal-usability. However, the online deployment of BERT is often blocked by its large-scale parameters and high computational cost. There are plenty of studies showing that the knowledge distillation is efficient in transferring the knowledge from BERT into the model with a smaller size of parameters. Nevertheless, current BERT distillation approaches mainly focus on task-specified distillation, such methodologies lead to the loss of the general semantic knowledge of BERT for universal-usability. In this paper, we propose a sentence representation approximating oriented distillation framework that can distill the pre-trained BERT into a simple LSTM based model without specifying tasks. Consistent with BERT, our distilled model is able to perform transfer learning via fine-tuning to adapt to any sentence-level downstream task. Besides, our model can further cooperate with task-specific distillation procedures. The experimental results on multiple NLP tasks from the GLUE benchmark show that our approach outperforms other task-specific distillation methods or even much larger models, i.e., ELMO, with efficiency well-improved.

Via

Access Paper or Ask Questions

Guiding Variational Response Generator to Exploit Persona

Nov 06, 2019

Bowen Wu, Mengyuan Li, Zongsheng Wang, Yifu Chen, Derek Wong, Qihang Feng, Junhong Huang, Baoxun Wang

Figure 1 for Guiding Variational Response Generator to Exploit Persona

Figure 2 for Guiding Variational Response Generator to Exploit Persona

Figure 3 for Guiding Variational Response Generator to Exploit Persona

Figure 4 for Guiding Variational Response Generator to Exploit Persona

Abstract:Leveraging persona information of users in Neural Response Generators (NRG) to perform personalized conversations has been considered as an attractive and important topic in the research of conversational agents over the past few years. Despite of the promising progresses achieved by recent studies in this field, persona information tends to be incorporated into neural networks in the form of user embeddings, with the expectation that the persona can be involved via the End-to-End learning. This paper proposes to adopt the personality-related characteristics of human conversations into variational response generators, by designing a specific conditional variational autoencoder based deep model with two new regularization terms employed to the loss function, so as to guide the optimization towards the direction of generating both persona-aware and relevant responses. Besides, to reasonably evaluate the performances of various persona modeling approaches, this paper further presents three direct persona-oriented metrics from different perspectives. The experimental results have shown that our proposed methodology can notably improve the performance of persona-aware response generation, and the metrics are reasonable to evaluate the results.

Via

Access Paper or Ask Questions

MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from Natural Sentences

Aug 14, 2019

Yifu Chen, Zongsheng Wang, Bowen Wu, Mengyuan Li, Huan Zhang, Lin Ma, Feng Liu, Qihang Feng, Baoxun Wang

Figure 1 for MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from Natural Sentences

Figure 2 for MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from Natural Sentences

Figure 3 for MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from Natural Sentences

Figure 4 for MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from Natural Sentences

Abstract:Chinese meme-face is a special kind of internet subculture widely spread in Chinese Social Community Networks. It usually consists of a template image modified by some amusing details and a text caption. In this paper, we present MemeFaceGenerator, a Generative Adversarial Network with the attention module and template information as supplementary signals, to automatically generate meme-faces from text inputs. We also develop a web service as system demonstration of meme-face synthesis. MemeFaceGenerator has been shown to be capable of generating high-quality meme-faces from random text inputs.

Via

Access Paper or Ask Questions

Learning to Generate Structured Queries from Natural Language with Indirect Supervision

Sep 10, 2018

Ziwei Bai, Bo Yu, Bowen Wu, Zhuoran Wang, Baoxun Wang

Figure 1 for Learning to Generate Structured Queries from Natural Language with Indirect Supervision

Figure 2 for Learning to Generate Structured Queries from Natural Language with Indirect Supervision

Figure 3 for Learning to Generate Structured Queries from Natural Language with Indirect Supervision

Figure 4 for Learning to Generate Structured Queries from Natural Language with Indirect Supervision

Abstract:Generating structured query language (SQL) from natural language is an emerging research topic. This paper presents a new learning paradigm from indirect supervision of the answers to natural language questions, instead of SQL queries. This paradigm facilitates the acquisition of training data due to the abundant resources of question-answer pairs for various domains in the Internet, and expels the difficult SQL annotation job. An end-to-end neural model integrating with reinforcement learning is proposed to learn SQL generation policy within the answer-driven learning paradigm. The model is evaluated on datasets of different domains, including movie and academic publication. Experimental results show that our model outperforms the baseline models.

* 11 pages, 4 figures

Via

Access Paper or Ask Questions