Abstract:Intent detection, a critical component in task-oriented dialogue (TOD) systems, faces significant challenges in adapting to the rapid influx of integrable tools with complex interrelationships. Existing approaches, such as zero-shot reformulations and LLM-based dynamic recognition, struggle with performance degradation when encountering unseen intents, leading to erroneous task routing. To enhance the model's generalization performance on unseen tasks, we employ Reinforcement Learning (RL) combined with a Reward-based Curriculum Sampling (RCS) during Group Relative Policy Optimization (GRPO) training in intent detection tasks. Experiments demonstrate that RL-trained models substantially outperform supervised fine-tuning (SFT) baselines in generalization. Besides, the introduction of the RCS, significantly bolsters the effectiveness of RL in intent detection by focusing the model on challenging cases during training. Moreover, incorporating Chain-of-Thought (COT) processes in RL notably improves generalization in complex intent detection tasks, underscoring the importance of thought in challenging scenarios. This work advances the generalization of intent detection tasks, offering practical insights for deploying adaptable dialogue systems.
Abstract:We present F5R-TTS, a novel text-to-speech (TTS) system that integrates Gradient Reward Policy Optimization (GRPO) into a flow-matching based architecture. By reformulating the deterministic outputs of flow-matching TTS into probabilistic Gaussian distributions, our approach enables seamless integration of reinforcement learning algorithms. During pretraining, we train a probabilistically reformulated flow-matching based model which is derived from F5-TTS with an open-source dataset. In the subsequent reinforcement learning (RL) phase, we employ a GRPO-driven enhancement stage that leverages dual reward metrics: word error rate (WER) computed via automatic speech recognition and speaker similarity (SIM) assessed by verification models. Experimental results on zero-shot voice cloning demonstrate that F5R-TTS achieves significant improvements in both speech intelligibility (relatively 29.5\% WER reduction) and speaker similarity (relatively 4.6\% SIM score increase) compared to conventional flow-matching based TTS systems. Audio samples are available at https://frontierlabs.github.io/F5R.
Abstract:Automatic counting soybean pods and seeds in outdoor fields allows for rapid yield estimation before harvesting, while indoor laboratory counting offers greater accuracy. Both methods can significantly accelerate the breeding process. However, it remains challenging for accurately counting pods and seeds in outdoor fields, and there are still no accurate enough tools for counting pods and seeds in laboratories. In this study, we developed efficient deep learning models for counting soybean pods and seeds in both outdoor fields and indoor laboratories. For outdoor fields, annotating not only visible seeds but also occluded seeds makes YOLO have the ability to estimate the number of soybean seeds that are occluded. Moreover, we enhanced YOLO architecture by integrating it with HQ-SAM (YOLO-SAM), and domain adaptation techniques (YOLO-DA), to improve model robustness and generalization across soybean images taken in outdoor fields. Testing on soybean images from the outdoor field, we achieved a mean absolute error (MAE) of 6.13 for pod counting and 10.05 for seed counting. For the indoor setting, we utilized Mask-RCNN supplemented with a Swin Transformer module (Mask-RCNN-Swin), models were trained exclusively on synthetic training images generated from a small set of labeled data. This approach resulted in near-perfect accuracy, with an MAE of 1.07 for pod counting and 1.33 for seed counting across actual laboratory images from two distinct studies.