Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiahao Zhao

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

May 22, 2025

Huatong Song, Jinhao Jiang, Wenqing Tian, Zhipeng Chen, Yuhuan Wu, Jiahao Zhao, Yingqian Min, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen

Abstract:Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.

Via

Access Paper or Ask Questions

Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits

Mar 04, 2025

Zichun Ye, Chihao Zhang, Jiahao Zhao

Abstract:We study the problem of minimizing gap-dependent regret for single-pass streaming stochastic multi-armed bandits (MAB). In this problem, the $n$ arms are present in a stream, and at most $m<n$ arms and their statistics can be stored in the memory. We establish tight non-asymptotic regret bounds regarding all relevant parameters, including the number of arms $n$, the memory size $m$, the number of rounds $T$ and $(\Delta_i)_{i\in [n]}$ where $\Delta_i$ is the reward mean gap between the best arm and the $i$-th arm. These gaps are not known in advance by the player. Specifically, for any constant $\alpha \ge 1$, we present two algorithms: one applicable for $m\ge \frac{2}{3}n$ with regret at most $O_\alpha\Big(\frac{(n-m)T^{\frac{1}{\alpha + 1}}}{n^{1 + {\frac{1}{\alpha + 1}}}}\displaystyle\sum_{i:\Delta_i > 0}\Delta_i^{1 - 2\alpha}\Big)$ and another applicable for $m<\frac{2}{3}n$ with regret at most $O_\alpha\Big(\frac{T^{\frac{1}{\alpha+1}}}{m^{\frac{1}{\alpha+1}}}\displaystyle\sum_{i:\Delta_i > 0}\Delta_i^{1 - 2\alpha}\Big)$. We also prove matching lower bounds for both cases by showing that for any constant $\alpha\ge 1$ and any $m\leq k < n$, there exists a set of hard instances on which the regret of any algorithm is $\Omega_\alpha\Big(\frac{(k-m+1) T^{\frac{1}{\alpha+1}}}{k^{1 + \frac{1}{\alpha+1}}} \sum_{i:\Delta_i > 0}\Delta_i^{1-2\alpha}\Big)$. This is the first tight gap-dependent regret bound for streaming MAB. Prior to our work, an $O\Big(\sum_{i\colon\Delta>0} \frac{\sqrt{T}\log T}{\Delta_i}\Big)$ upper bound for the special case of $\alpha=1$ and $m=O(1)$ was established by Agarwal, Khanna and Patil (COLT'22). In contrast, our results provide the correct order of regret as $\Theta\Big(\frac{1}{\sqrt{m}}\sum_{i\colon\Delta>0}\frac{\sqrt{T}}{\Delta_i}\Big)$.

Via

Access Paper or Ask Questions

Streaming Piano Transcription Based on Consistent Onset and Offset Decoding with Sustain Pedal Detection

Mar 03, 2025

Weixing Wei, Jiahao Zhao, Yulun Wu, Kazuyoshi Yoshii

Abstract:This paper describes a streaming audio-to-MIDI piano transcription approach that aims to sequentially translate a music signal into a sequence of note onset and offset events. The sequence-to-sequence nature of this task may call for the computationally-intensive transformer model for better performance, which has recently been used for offline transcription benchmarks and could be extended for streaming transcription with causal attention mechanisms. We assume that the performance limitation of this naive approach lies in the decoder. Although time-frequency features useful for onset detection are considerably different from those for offset detection, the single decoder is trained to output a mixed sequence of onset and offset events without guarantee of the correspondence between the onset and offset events of the same note. To overcome this limitation, we propose a streaming encoder-decoder model that uses a convolutional encoder aggregating local acoustic features, followed by an autoregressive Transformer decoder detecting a variable number of onset events and another decoder detecting the offset events for the active pitches with validation of the sustain pedal at each time frame. Experiments using the MAESTRO dataset showed that the proposed streaming method performed comparably with or even better than the state-of-the-art offline methods while significantly reducing the computational cost.

* Accepted to ISMIR 2024

Via

Access Paper or Ask Questions

Intelligence Test

Feb 26, 2025

Jingtao Zhan, Jiahao Zhao, Jiayu Li, Yiqun Liu, Bo Zhang, Qingyao Ai, Jiaxin Mao, Hongning Wang, Min Zhang, Shaoping Ma

Abstract:How does intelligence emerge? We propose that intelligence is not a sudden gift or random occurrence, but rather a necessary trait for species to survive through Natural Selection. If a species passes the test of Natural Selection, it demonstrates the intelligence to survive in nature. Extending this perspective, we introduce Intelligence Test, a method to quantify the intelligence of any subject on any task. Like how species evolve by trial and error, Intelligence Test quantifies intelligence by the number of failed attempts before success. Fewer failures correspond to higher intelligence. When the expectation and variance of failure counts are both finite, it signals the achievement of an autonomous level of intelligence. Using Intelligence Test, we comprehensively evaluate existing AI systems. Our results show that while AI systems achieve a level of autonomy in simple tasks, they are still far from autonomous in more complex tasks, such as vision, search, recommendation, and language. While scaling model size might help, this would come at an astronomical cost. Projections suggest that achieving general autonomy would require unimaginable $10^{26}$ parameters. Even if Moore's Law continuously holds, such a parameter scale would take $70$ years. This staggering cost highlights the complexity of human tasks and the inadequacies of current AI. To further understand this phenomenon, we conduct a theoretical analysis. Our simulations suggest that human tasks possess a criticality property. As a result, autonomy requires a deep understanding of the task's underlying mechanisms. Current AI, however, does not fully grasp these mechanisms and instead relies on superficial mimicry, making it difficult to reach an autonomous level. We believe Intelligence Test can not only guide the future development of AI but also offer profound insights into the intelligence of humans ourselves.

Via

Access Paper or Ask Questions

Generating Model Parameters for Controlling: Parameter Diffusion for Controllable Multi-Task Recommendation

Oct 14, 2024

Chenglei Shen, Jiahao Zhao, Xiao Zhang, Weijie Yu, Ming He, Jianping Fan

Figure 1 for Generating Model Parameters for Controlling: Parameter Diffusion for Controllable Multi-Task Recommendation

Figure 2 for Generating Model Parameters for Controlling: Parameter Diffusion for Controllable Multi-Task Recommendation

Figure 3 for Generating Model Parameters for Controlling: Parameter Diffusion for Controllable Multi-Task Recommendation

Figure 4 for Generating Model Parameters for Controlling: Parameter Diffusion for Controllable Multi-Task Recommendation

Abstract:Commercial recommender systems face the challenge that task requirements from platforms or users often change dynamically (e.g., varying preferences for accuracy or diversity). Ideally, the model should be re-trained after resetting a new objective function, adapting to these changes in task requirements. However, in practice, the high computational costs associated with retraining make this process impractical for models already deployed to online environments. This raises a new challenging problem: how to efficiently adapt the learning model to different task requirements by controlling model parameters after deployment, without the need for retraining. To address this issue, we propose a novel controllable learning approach via Parameter Diffusion for controllable multi-task Recommendation (PaDiRec), which allows the customization and adaptation of recommendation model parameters to new task requirements without retraining. Specifically, we first obtain the optimized model parameters through adapter tunning based on the feasible task requirements. Then, we utilize the diffusion model as a parameter generator, employing classifier-free guidance in conditional training to learn the distribution of optimized model parameters under various task requirements. Finally, the diffusion model is applied to effectively generate model parameters in a test-time adaptation manner given task requirements. As a model-agnostic approach, PaDiRec can leverage existing recommendation models as backbones to enhance their controllability. Extensive experiments on public datasets and a dataset from a commercial app, indicate that PaDiRec can effectively enhance controllability through efficient model parameter generation. The code is released at https://anonymous.4open.science/r/PaDiRec-DD13.

Via

Access Paper or Ask Questions

Low-probability of Intercept/Detect (LPI/LPD) Secure Communications Using Antenna Arrays Employing Rapid Sidelobe Time Modulation

Jun 17, 2024

Jiahao Zhao, Shichen Qiao, John H. Booske, Nader Behdad

Abstract:We present an electronically-reconfigurable antenna array offering low probability of intercept/detect (LPI/LPD) and secure communications capabilities simultaneously at the physical layer. This antenna array is designed to provide rapidly time-varying sidelobes and a stationary main lobe. By performing rapid sidelobe time modulation (SLTM), the signal transmitted in the undesired directions (i.e., through sidelobes) undergoes spread-spectrum distortion making it more difficult to be detected, intercepted, and deciphered while the signal transmitted in the desired direction (i.e., through the main lobe) is unaffected. Therefore, the intended receiver would not need additional modifications (i.e. encryption keys) to detect and recover the signal. We describe the operating principles of this SLTM array and validate its spread-spectrum SLTM sequence generation in undesired directions through theory, simulations, and experiments. Using a fabricated SLTM prototype operating at X band, we conducted system-level measurements to demonstrate its LPI/LPD, secure communications, and jamming resilience capabilities. The presented method is a physical layer technique, which can bring LPI/LPD capabilities to existing communications systems by simply replacing their antennas with SLTM arrays. This technique can be used independently or in combination with additional coding and signal-processing techniques to achieve further enhancements in LPI/LPD and secure communications.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

May 26, 2024

Chenhao Zhang, Renhao Li, Minghuan Tan, Min Yang, Jingwei Zhu, Di Yang, Jiahao Zhao, Guancheng Ye, Chengming Li, Xiping Hu(+1 more)

Abstract:Using large language models (LLMs) to assist psychological counseling is a significant but challenging task at present. Attempts have been made on improving empathetic conversations or acting as effective assistants in the treatment with LLMs. However, the existing datasets lack consulting knowledge, resulting in LLMs lacking professional consulting competence. Moreover, how to automatically evaluate multi-turn dialogues within the counseling process remains an understudied area. To bridge the gap, we propose CPsyCoun, a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling. To fully exploit psychological counseling reports, a two-phase approach is devised to construct high-quality dialogues while a comprehensive evaluation benchmark is developed for the effective automatic evaluation of multi-turn psychological consultations. Competitive experimental results demonstrate the effectiveness of our proposed framework in psychological counseling. We open-source the datasets and model for future research at https://github.com/CAS-SIAT-XinHai/CPsyCoun

* Appectped to Findings of ACL2024

Via

Access Paper or Ask Questions

CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

May 16, 2024

Jiahao Zhao, Jingwei Zhu, Minghuan Tan, Min Yang, Di Yang, Chenhao Zhang, Guancheng Ye, Chengming Li, Xiping Hu

Figure 1 for CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

Figure 2 for CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

Figure 3 for CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

Figure 4 for CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

Abstract:In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese language examinations. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques.Furthermore, we evaluate a range of existing large language models~(LLMs), spanning from open-sourced to API-based models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities.

Via

Access Paper or Ask Questions

Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness

Oct 26, 2022

Jiahao Zhao, Wenji Mao

Figure 1 for Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness

Figure 2 for Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness

Figure 3 for Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness

Figure 4 for Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness

Abstract:Adversarial vulnerability remains a major obstacle to constructing reliable NLP systems. When imperceptible perturbations are added to raw input text, the performance of a deep learning model may drop dramatically under attacks. Recent work argues the adversarial vulnerability of the model is caused by the non-robust features in supervised training. Thus in this paper, we tackle the adversarial robustness challenge from the view of disentangled representation learning, which is able to explicitly disentangle robust and non-robust features in text. Specifically, inspired by the variation of information (VI) in information theory, we derive a disentangled learning objective composed of mutual information to represent both the semantic representativeness of latent embeddings and differentiation of robust and non-robust features. On the basis of this, we design a disentangled learning network to estimate these mutual information. Experiments on text classification and entailment tasks show that our method significantly outperforms the representative methods under adversarial attacks, indicating that discarding non-robust features is critical for improving adversarial robustness.

Via

Access Paper or Ask Questions

Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance

Sep 19, 2022

Dichucheng Li, Yulun Wu, Qinyu Li, Jiahao Zhao, Yi Yu, Fan Xia, Wei Li

Figure 1 for Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance

Figure 2 for Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance

Figure 3 for Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance

Figure 4 for Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance

Abstract:The Guzheng is a kind of traditional Chinese instruments with diverse playing techniques. Instrument playing techniques (IPT) play an important role in musical performance. However, most of the existing works for IPT detection show low efficiency for variable-length audio and provide no assurance in the generalization as they rely on a single sound bank for training and testing. In this study, we propose an end-to-end Guzheng playing technique detection system using Fully Convolutional Networks that can be applied to variable-length audio. Because each Guzheng playing technique is applied to a note, a dedicated onset detector is trained to divide an audio into several notes and its predictions are fused with frame-wise IPT predictions. During fusion, we add the IPT predictions frame by frame inside each note and get the IPT with the highest probability within each note as the final output of that note. We create a new dataset named GZ_IsoTech from multiple sound banks and real-world recordings for Guzheng performance analysis. Our approach achieves 87.97% in frame-level accuracy and 80.76% in note-level F1-score, outperforming existing works by a large margin, which indicates the effectiveness of our proposed method in IPT detection.

* Accepted to ISMIR 2022

Via

Access Paper or Ask Questions