Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kai Song

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Oct 24, 2025

Chenglong Wang, Yang Gan, Hang Zhou, Chi Hu, Yongyu Mu, Kai Song, Murun Yang, Bei Li, Chunliang Zhang, Tongran Liu(+3 more)

Abstract:Recent advances in diffusion language models (DLMs) have presented a promising alternative to traditional autoregressive large language models (LLMs). However, DLMs still lag behind LLMs in reasoning performance, especially as the number of denoising steps decreases. Our analysis reveals that this shortcoming arises primarily from the independent generation of masked tokens across denoising steps, which fails to capture the token correlation. In this paper, we define two types of token correlation: intra-sequence correlation and inter-sequence correlation, and demonstrate that enhancing these correlations improves reasoning performance. To this end, we propose a Multi-Reward Optimization (MRO) approach, which encourages DLMs to consider the token correlation during the denoising process. More specifically, our MRO approach leverages test-time scaling, reject sampling, and reinforcement learning to directly optimize the token correlation with multiple elaborate rewards. Additionally, we introduce group step and importance sampling strategies to mitigate reward variance and enhance sampling efficiency. Through extensive experiments, we demonstrate that MRO not only improves reasoning performance but also achieves significant sampling speedups while maintaining high performance on reasoning benchmarks.

* Accepted by NeurIPS 2025

Via

Access Paper or Ask Questions

ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning

May 21, 2025

Changtai Zhu, Siyin Wang, Ruijun Feng, Kai Song, Xipeng Qiu

Abstract:Conversational search systems require effective handling of context-dependent queries that often contain ambiguity, omission, and coreference. Conversational Query Reformulation (CQR) addresses this challenge by transforming these queries into self-contained forms suitable for off-the-shelf retrievers. However, existing CQR approaches suffer from two critical constraints: high dependency on costly external supervision from human annotations or large language models, and insufficient alignment between the rewriting model and downstream retrievers. We present ConvSearch-R1, the first self-driven framework that completely eliminates dependency on external rewrite supervision by leveraging reinforcement learning to optimize reformulation directly through retrieval signals. Our novel two-stage approach combines Self-Driven Policy Warm-Up to address the cold-start problem through retrieval-guided self-distillation, followed by Retrieval-Guided Reinforcement Learning with a specially designed rank-incentive reward shaping mechanism that addresses the sparsity issue in conventional retrieval metrics. Extensive experiments on TopiOCQA and QReCC datasets demonstrate that ConvSearch-R1 significantly outperforms previous state-of-the-art methods, achieving over 10% improvement on the challenging TopiOCQA dataset while using smaller 3B parameter models without any external supervision.

Via

Access Paper or Ask Questions

Semformer: Transformer Language Models with Semantic Planning

Sep 17, 2024

Yongjing Yin, Junran Ding, Kai Song, Yue Zhang

Figure 1 for Semformer: Transformer Language Models with Semantic Planning

Figure 2 for Semformer: Transformer Language Models with Semantic Planning

Figure 3 for Semformer: Transformer Language Models with Semantic Planning

Figure 4 for Semformer: Transformer Language Models with Semantic Planning

Abstract:Next-token prediction serves as the dominant component in current neural language models. During the training phase, the model employs teacher forcing, which predicts tokens based on all preceding ground truth tokens. However, this approach has been found to create shortcuts, utilizing the revealed prefix to spuriously fit future tokens, potentially compromising the accuracy of the next-token predictor. In this paper, we introduce Semformer, a novel method of training a Transformer language model that explicitly models the semantic planning of response. Specifically, we incorporate a sequence of planning tokens into the prefix, guiding the planning token representations to predict the latent semantic representations of the response, which are induced by an autoencoder. In a minimal planning task (i.e., graph path-finding), our model exhibits near-perfect performance and effectively mitigates shortcut learning, a feat that standard training methods and baseline models have been unable to accomplish. Furthermore, we pretrain Semformer from scratch with 125M parameters, demonstrating its efficacy through measures of perplexity, in-context learning, and fine-tuning on summarization tasks.

Via

Access Paper or Ask Questions

Large Language Models are Parallel Multilingual Learners

Mar 14, 2024

Yongyu Mu, Peinan Feng, Zhiquan Cao, Yuzhang Wu, Bei Li, Chenglong Wang, Tong Xiao, Kai Song, Tongran Liu, Chunliang Zhang(+1 more)

Abstract:In this study, we reveal an in-context learning (ICL) capability of multilingual large language models (LLMs): by translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities. To test this capability, we design extensive experiments encompassing 8 typical datasets, 7 languages and 8 state-of-the-art multilingual LLMs. Experimental results show that (1) incorporating more languages help PiM surpass the conventional ICL further; (2) even combining with the translations that are inferior to baseline performance can also help. Moreover, by examining the activated neurons in LLMs, we discover a counterintuitive but interesting phenomenon. Contrary to the common thought that PiM would activate more neurons than monolingual input to leverage knowledge learned from diverse languages, PiM actually inhibits neurons and promotes more precise neuron activation especially when more languages are added. This phenomenon aligns with the neuroscience insight about synaptic pruning, which removes less used neural connections, strengthens remainders, and then enhances brain intelligence.

* Working in process

Via

Access Paper or Ask Questions

Adaptive Hybrid Masking Strategy for Privacy-Preserving Face Recognition Against Model Inversion Attack

Mar 14, 2024

Yuanqing Huang, Yinggui Wang, Jianshu Li, Le Yang, Kai Song, Lei Wang

Figure 1 for Adaptive Hybrid Masking Strategy for Privacy-Preserving Face Recognition Against Model Inversion Attack

Figure 2 for Adaptive Hybrid Masking Strategy for Privacy-Preserving Face Recognition Against Model Inversion Attack

Figure 3 for Adaptive Hybrid Masking Strategy for Privacy-Preserving Face Recognition Against Model Inversion Attack

Figure 4 for Adaptive Hybrid Masking Strategy for Privacy-Preserving Face Recognition Against Model Inversion Attack

Abstract:The utilization of personal sensitive data in training face recognition (FR) models poses significant privacy concerns, as adversaries can employ model inversion attacks (MIA) to infer the original training data. Existing defense methods, such as data augmentation and differential privacy, have been employed to mitigate this issue. However, these methods often fail to strike an optimal balance between privacy and accuracy. To address this limitation, this paper introduces an adaptive hybrid masking algorithm against MIA. Specifically, face images are masked in the frequency domain using an adaptive MixUp strategy. Unlike the traditional MixUp algorithm, which is predominantly used for data augmentation, our modified approach incorporates frequency domain mixing. Previous studies have shown that increasing the number of images mixed in MixUp can enhance privacy preservation but at the expense of reduced face recognition accuracy. To overcome this trade-off, we develop an enhanced adaptive MixUp strategy based on reinforcement learning, which enables us to mix a larger number of images while maintaining satisfactory recognition accuracy. To optimize privacy protection, we propose maximizing the reward function (i.e., the loss function of the FR system) during the training of the strategy network. While the loss function of the FR network is minimized in the phase of training the FR network. The strategy network and the face recognition network can be viewed as antagonistic entities in the training process, ultimately reaching a more balanced trade-off. Experimental results demonstrate that our proposed hybrid masking scheme outperforms existing defense algorithms in terms of privacy preservation and recognition accuracy against MIA.

Via

Access Paper or Ask Questions

Power Transformer Fault Prediction Based on Knowledge Graphs

Feb 11, 2024

Chao Wang, Zhuo Chen, Ziyan Zhang, Chiyi Li, Kai Song

Abstract:In this paper, we address the challenge of learning with limited fault data for power transformers. Traditional operation and maintenance tools lack effective predictive capabilities for potential faults. The scarcity of extensive fault data makes it difficult to apply machine learning techniques effectively. To solve this problem, we propose a novel approach that leverages the knowledge graph (KG) technology in combination with gradient boosting decision trees (GBDT). This method is designed to efficiently learn from a small set of high-dimensional data, integrating various factors influencing transformer faults and historical operational data. Our approach enables accurate safe state assessments and fault analyses of power transformers despite the limited fault characteristic data. Experimental results demonstrate that this method outperforms other learning approaches in prediction accuracy, such as artificial neural networks (ANN) and logistic regression (LR). Furthermore, it offers significant improvements in progressiveness, practicality, and potential for widespread application.

Via

Access Paper or Ask Questions

Single-pixel imaging based on deep learning

Oct 25, 2023

Kai Song, Yaoxing Bian, Ku Wu, Hongrui Liu, Shuangping Han, Jiaming Li, Jiazhao Tian, Chengbin Qin, Jianyong Hu, Liantuan Xiao

Figure 1 for Single-pixel imaging based on deep learning

Figure 2 for Single-pixel imaging based on deep learning

Figure 3 for Single-pixel imaging based on deep learning

Figure 4 for Single-pixel imaging based on deep learning

Abstract:Since the advent of single-pixel imaging and machine learning, both fields have flourished, but followed parallel tracks. Until recently, machine learning, especially deep learning, has demonstrated effectiveness in delivering high-quality solutions across various application domains of single-pixel imaging. This article comprehensively reviews the research of single-pixel imaging technology based on deep learning. From the basic principles of single-pixel imaging and deep learning, the principles and implementation methods of single-pixel imaging based on deep learning are described in detail. Then, the research status and development trend of single-pixel imaging based on deep learning in various domains are analyzed, including super-resolution single-pixel imaging, single-pixel imaging through scattering media, photon-level single-pixel imaging, optical encryption based on single-pixel imaging, color single-pixel imaging, and image-free sensing. Finally, this review explores the limitations in the ongoing research, while offers the delivering insights into prospective avenues for future research.

Via

Access Paper or Ask Questions

Audience Expansion for Multi-show Release Based on an Edge-prompted Heterogeneous Graph Network

Apr 08, 2023

Kai Song, Shaofeng Wang, Ziwei Xie, Shanyu Wang, Jiahong Li, Yongqiang Yang

Figure 1 for Audience Expansion for Multi-show Release Based on an Edge-prompted Heterogeneous Graph Network

Figure 2 for Audience Expansion for Multi-show Release Based on an Edge-prompted Heterogeneous Graph Network

Figure 3 for Audience Expansion for Multi-show Release Based on an Edge-prompted Heterogeneous Graph Network

Figure 4 for Audience Expansion for Multi-show Release Based on an Edge-prompted Heterogeneous Graph Network

Abstract:In the user targeting and expanding of new shows on a video platform, the key point is how their embeddings are generated. It's supposed to be personalized from the perspective of both users and shows. Furthermore, the pursue of both instant (click) and long-time (view time) rewards, and the cold-start problem for new shows bring additional challenges. Such a problem is suitable for processing by heterogeneous graph models, because of the natural graph structure of data. But real-world networks usually have billions of nodes and various types of edges. Few existing methods focus on handling large-scale data and exploiting different types of edges, especially the latter. In this paper, we propose a two-stage audience expansion scheme based on an edge-prompted heterogeneous graph network which can take different double-sided interactions and features into account. In the offline stage, to construct the graph, user IDs and specific side information combinations of the shows are chosen to be the nodes, and click/co-click relations and view time are used to build the edges. Embeddings and clustered user groups are then calculated. When new shows arrive, their embeddings and subsequent matching users can be produced within a consistent space. In the online stage, posterior data including click/view users are employed as seeds to look for similar users. The results on the public datasets and our billion-scale data demonstrate the accuracy and efficiency of our approach.

Via

Access Paper or Ask Questions

Enhanced Preamble Based MAC Mechanism for IIoT-oriented PLC Network

Mar 22, 2022

Kai Song, Biqian Feng, Yongpeng Wu, Wenjun Zhang

Figure 1 for Enhanced Preamble Based MAC Mechanism for IIoT-oriented PLC Network

Figure 2 for Enhanced Preamble Based MAC Mechanism for IIoT-oriented PLC Network

Figure 3 for Enhanced Preamble Based MAC Mechanism for IIoT-oriented PLC Network

Figure 4 for Enhanced Preamble Based MAC Mechanism for IIoT-oriented PLC Network

Abstract:In this paper, we propose an enhanced preamble based media access control mechanism (E-PMAC), which can be applied in power line communication (PLC) network for Industrial Internet of Things (IIoT). We introduce detailed technologies used in E-PMAC, including delay calibration mechanism, preamble design, and slot allocation algorithm. With these technologies, E-PMAC is more robust than existing preamble based MAC mechanism (P-MAC). Besides, we analyze the disadvantage of P-MAC in multi-layer networking and design the networking process of E-PMAC to accelerate networking process. We analyze the complexity of networking process in P-MAC and E-PMAC and prove that E-PMAC has lower complexity than P-MAC. Finally, we simulate the single-layer networking and multi-layer networking of E-PMAC, P-MAC, and existing PLC protocol, i.e. , IEEE1901.1. The simulation results indicate that E-PMAC spends much less time in networking than IEEE1901.1 and P-MAC. Finally, with our work, a PLC network based on E-PMAC mechanism can be realized.

* 7 pages, 12 figures, to appeal in The 2022 IEEE 95th Vehicular Technology Conference (VTC2022-Spring)

Via

Access Paper or Ask Questions

Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability

Dec 10, 2019

Shuaicheng Liu, Zehao Zhang, Kai Song, Bing Zeng

Figure 1 for Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability

Figure 2 for Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability

Figure 3 for Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability

Figure 4 for Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability

Abstract:The unprecedented performance achieved by deep convolutional neural networks for image classification is linked primarily to their ability of capturing rich structural features at various layers within networks. Here we design a series of experiments, inspired by children's learning of the arithmetic addition of two integers, to showcase that such deep networks can go beyond the structural features to learn deeper knowledge. In our experiments, a set of images is constructed, each image containing an arithmetic addition $n+m$ in its central area, and several classification networks are then trained over a subset of images, using the sum as the label. Tests on the excluded images show that, as the image set gets larger, the networks have well learnt the law of arithmetic additions so as to build up their autonomous reasoning ability strongly. For instance, networks trained over a small percentage of images can classify a big majority of the remaining images correctly, and many arithmetic additions involving some integers that have never been seen during the training can also be solved correctly by the trained networks.

* 6 pages, 6 figures

Via

Access Paper or Ask Questions