Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xue Zhang

Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning

Oct 08, 2025

Xue Zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Kaiyu Huang, Yufeng Chen, Jinan Xu, Jie Zhou

Figure 1 for Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning

Figure 2 for Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning

Figure 3 for Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning

Figure 4 for Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning

Abstract:Large Reasoning Models (LRMs) have achieved remarkable performance on complex reasoning tasks by adopting the "think-then-answer" paradigm, which enhances both accuracy and interpretability. However, current LRMs exhibit two critical limitations when processing non-English languages: (1) They often struggle to maintain input-output language consistency; (2) They generally perform poorly with wrong reasoning paths and lower answer accuracy compared to English. These limitations significantly degrade the user experience for non-English speakers and hinder the global deployment of LRMs. To address these limitations, we propose M-Thinker, which is trained by the GRPO algorithm that involves a Language Consistency (LC) reward and a novel Cross-lingual Thinking Alignment (CTA) reward. Specifically, the LC reward defines a strict constraint on the language consistency between the input, thought, and answer. Besides, the CTA reward compares the model's non-English reasoning paths with its English reasoning path to transfer its own reasoning capability from English to non-English languages. Through an iterative RL procedure, our M-Thinker-1.5B/7B models not only achieve nearly 100% language consistency and superior performance on two multilingual benchmarks (MMATH and PolyMath), but also exhibit excellent generalization on out-of-domain languages.

* 13 pages, 8 tables, 4 figures

Via

Access Paper or Ask Questions

CM-Align: Consistency-based Multilingual Alignment for Large Language Models

Sep 10, 2025

Xue Zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Yufeng Chen, Jinan Xu, Jie Zhou

Abstract:Current large language models (LLMs) generally show a significant performance gap in alignment between English and other languages. To bridge this gap, existing research typically leverages the model's responses in English as a reference to select the best/worst responses in other languages, which are then used for Direct Preference Optimization (DPO) training. However, we argue that there are two limitations in the current methods that result in noisy multilingual preference data and further limited alignment performance: 1) Not all English responses are of high quality, and using a response with low quality may mislead the alignment for other languages. 2) Current methods usually use biased or heuristic approaches to construct multilingual preference pairs. To address these limitations, we design a consistency-based data selection method to construct high-quality multilingual preference data for improving multilingual alignment (CM-Align). Specifically, our method includes two parts: consistency-guided English reference selection and cross-lingual consistency-based multilingual preference data construction. Experimental results on three LLMs and three common tasks demonstrate the effectiveness and superiority of our method, which further indicates the necessity of constructing high-quality preference data.

* EMNLP 2025 Findings

Via

Access Paper or Ask Questions

Design of 3D Beamforming and Deployment Strategies for ISAC-based HAPS Systems

Jun 12, 2025

Xue Zhang, Bang Huang, Mohamed-Slim Alouini

Abstract:This paper explores high-altitude platform station (HAPS) systems enabled by integrated sensing and communication (ISAC), in which a HAPS simultaneously transmits communication signals and synthetic aperture radar (SAR) imaging signals to support multi-user communication while performing ground target sensing. Taking into account the operational characteristics of SAR imaging, we consider two HAPS deployment strategies: (i) a quasi-stationary HAPS that remains fixed at an optimized location during SAR operation, following the stop-and-go scanning model; and (ii) a dynamic HAPS that continuously adjusts its flight trajectory along a circular path. For each strategy, we aim at maximizing the weighted sum-rate throughput for communication users while ensuring that SAR imaging requirements, such as beampattern gain and signal-to-noise ratio (SNR), are satisfied. This is achieved by jointly optimizing the HAPS deployment strategy, i.e., its placement or trajectory, along with three-dimensional (3D) transmit beamforming, under practical constraints including transmit power limits, energy consumption, and flight dynamics. Nevertheless, the formulated optimization problems corresponding to the two deployment strategies are inherently non-convex. To address the issue, we propose efficient algorithms that leverage both convex and non-convex optimization techniques to obtain high-quality suboptimal solutions. Numerical results demonstrate the effectiveness and advantages of the proposed approaches over benchmark schemes.

Via

Access Paper or Ask Questions

Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts

May 28, 2025

Xue Zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Yufeng Chen, Jinan Xu, Jie Zhou

Abstract:Continually expanding new languages for existing large language models (LLMs) is a promising yet challenging approach to building powerful multilingual LLMs. The biggest challenge is to make the model continuously learn new languages while preserving the proficient ability of old languages. To achieve this, recent work utilizes the Mixture-of-Experts (MoE) architecture to expand new languages by adding new experts and avoid catastrophic forgetting of old languages by routing corresponding tokens to the original model backbone (old experts). Although intuitive, this kind of method is parameter-costly when expanding new languages and still inevitably impacts the performance of old languages. To address these limitations, we analyze the language characteristics of different layers in LLMs and propose a layer-wise expert allocation algorithm (LayerMoE) to determine the appropriate number of new experts for each layer. Specifically, we find different layers in LLMs exhibit different representation similarities between languages and then utilize the similarity as the indicator to allocate experts for each layer, i.e., the higher similarity, the fewer experts. Additionally, to further mitigate the forgetting of old languages, we add a classifier in front of the router network on the layers with higher similarity to guide the routing of old language tokens. Experimental results show that our method outperforms the previous state-of-the-art baseline with 60% fewer experts in the single-expansion setting and with 33.3% fewer experts in the lifelong-expansion setting, demonstrating the effectiveness of our method.

* ACL 2025 (Main), 16 pages, 5 figures, 11 tables

Via

Access Paper or Ask Questions

A Dual-Space Framework for General Knowledge Distillation of Large Language Models

Apr 15, 2025

Xue Zhang, Songming Zhang, Yunlong Liang, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou

Abstract:Knowledge distillation (KD) is a promising solution to compress large language models (LLMs) by transferring their knowledge to smaller models. During this process, white-box KD methods usually minimize the distance between the output distributions of the teacher model and the student model to transfer more information. However, we reveal that the current white-box KD framework exhibits two limitations: a) bridging probability distributions from different output spaces will limit the similarity between the teacher model and the student model; b) this framework cannot be applied to LLMs with different vocabularies. One of the root causes for these limitations is that the distributions from the teacher and the student for KD are output by different prediction heads, which yield distributions in different output spaces and dimensions. Therefore, in this paper, we propose a dual-space knowledge distillation (DSKD) framework that unifies the prediction heads of the teacher and the student models for KD. Specifically, we first introduce two projectors with ideal initialization to project the teacher/student hidden states into the student/teacher representation spaces. After this, the hidden states from different models can share the same head and unify the output spaces of the distributions. Furthermore, we develop an exact token alignment (ETA) algorithm to align the same tokens in two differently-tokenized sequences. Based on the above, our DSKD framework is a general KD framework that supports both off-policy and on-policy KD, and KD between any two LLMs regardless of their vocabularies. Extensive experiments on instruction-following, mathematical reasoning, and code generation benchmarks show that DSKD significantly outperforms existing methods based on the current white-box KD framework and surpasses other cross-tokenizer KD methods for LLMs with different vocabularies.

* 19 pages, 9 figures, 11 tables, under review. Code is available at: https://github.com/songmzhang/DSKDv2. arXiv admin note: text overlap with arXiv:2406.17328

Via

Access Paper or Ask Questions

Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B

Apr 07, 2025

Xue Zhang

Figure 1 for Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B

Figure 2 for Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B

Figure 3 for Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B

Figure 4 for Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B

Abstract:As language models continue to grow larger, the cost of acquiring high-quality training data has increased significantly. Collecting human feedback is both expensive and time-consuming, and manual labels can be noisy, leading to an imbalance between helpfulness and harmfulness. Constitutional AI, introduced by Anthropic in December 2022, uses AI to provide feedback to another AI, greatly reducing the need for human labeling. However, the original implementation was designed for a model with around 52 billion parameters, and there is limited information on how well Constitutional AI performs with smaller models, such as LLaMA 3-8B. In this paper, we replicated the Constitutional AI workflow using the smaller LLaMA 3-8B model. Our results show that Constitutional AI can effectively increase the harmlessness of the model, reducing the Attack Success Rate in MT-Bench by 40.8%. However, similar to the original study, increasing harmlessness comes at the cost of helpfulness. The helpfulness metrics, which are an average of the Turn 1 and Turn 2 scores, dropped by 9.8% compared to the baseline. Additionally, we observed clear signs of model collapse in the final DPO-CAI model, indicating that smaller models may struggle with self-improvement due to insufficient output quality, making effective fine-tuning more challenging. Our study suggests that, like reasoning and math ability, self-improvement is an emergent property.

* 6 pages, 2 figures. Conducted as part of research on alignment techniques for language models

Via

Access Paper or Ask Questions

Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Apr 04, 2025

Xuanyu Liu, Huiyun Yao, Jinggui Gao, Zhongyi Guo, Xue Zhang, Yulin Dong

Figure 1 for Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Figure 2 for Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Figure 3 for Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Figure 4 for Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Abstract:Background:Convolutional Neural Networks(CNN) and Vision Transformers(ViT) are the main techniques used in Medical image segmentation. However, CNN is limited to local contextual information, and ViT's quadratic complexity results in significant computational costs. At the same time, equipping the model to distinguish lesion boundaries with varying degrees of severity is also a challenge encountered in skin lesion segmentation. Purpose:This research aims to optimize the balance between computational costs and long-range dependency modelling and achieve excellent generalization across lesions with different degrees of severity. Methods:we propose a lightweight U-shape network that utilizes Vision Fastformer with Fusion Mechanism (VFFM-UNet). We inherit the advantages of Fastformer's additive attention mechanism, combining element-wise product and matrix product for comprehensive feature extraction and channel reduction to save computational costs. In order to accurately identify the lesion boundaries with varying degrees of severity, we designed Fusion Mechanism including Multi-Granularity Fusion and Channel Fusion, which can process the feature maps in the granularity and channel levels to obtain different contextual information. Results:Comprehensive experiments on the ISIC2017, ISIC2018 and PH2 datasets demonstrate that VFFM-UNet outperforms existing state-of-the-art models regarding parameter numbers, computational complexity and segmentation performance. In short, compared to MISSFormer, our model achieves superior segmentation performance while reducing parameter and computation costs by 101x and 15x, respectively. Conclusions:Both quantitative and qualitative analyses show that VFFM-UNet sets a new benchmark by reaching an ideal balance between parameter numbers, computational complexity, and segmentation performance compared to existing state-of-the-art models.

Via

Access Paper or Ask Questions

Symbolic Neural Ordinary Differential Equations

Mar 11, 2025

Xin Li, Chengli Zhao, Xue Zhang, Xiaojun Duan

Abstract:Differential equations are widely used to describe complex dynamical systems with evolving parameters in nature and engineering. Effectively learning a family of maps from the parameter function to the system dynamics is of great significance. In this study, we propose a novel learning framework of symbolic continuous-depth neural networks, termed Symbolic Neural Ordinary Differential Equations (SNODEs), to effectively and accurately learn the underlying dynamics of complex systems. Specifically, our learning framework comprises three stages: initially, pre-training a predefined symbolic neural network via a gradient flow matching strategy; subsequently, fine-tuning this network using Neural ODEs; and finally, constructing a general neural network to capture residuals. In this process, we apply the SNODEs framework to partial differential equation systems through Fourier analysis, achieving resolution-invariant modeling. Moreover, this framework integrates the strengths of symbolism and connectionism, boasting a universal approximation theorem while significantly enhancing interpretability and extrapolation capabilities relative to state-of-the-art baseline methods. We demonstrate this through experiments on several representative complex systems. Therefore, our framework can be further applied to a wide range of scientific problems, such as system bifurcation and control, reconstruction and forecasting, as well as the discovery of new equations.

* Accepted in AAAI 2025

Via

Access Paper or Ask Questions

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation

Mar 04, 2025

Songming Zhang, Xue Zhang, Tong Zhang, Bojie Hu, Yufeng Chen, Jinan Xu

Abstract:In modern large language models (LLMs), LLM alignment is of crucial importance and is typically achieved through methods such as reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO). However, in most existing methods for LLM alignment, all tokens in the response are optimized using a sparse, response-level reward or preference annotation. The ignorance of token-level rewards may erroneously punish high-quality tokens or encourage low-quality tokens, resulting in suboptimal performance and slow convergence speed. To address this issue, we propose AlignDistil, an RLHF-equivalent distillation method for token-level reward optimization. Specifically, we introduce the reward learned by DPO into the RLHF objective and theoretically prove the equivalence between this objective and a token-level distillation process, where the teacher distribution linearly combines the logits from the DPO model and a reference model. On this basis, we further bridge the accuracy gap between the reward from the DPO model and the pure reward model, by building a contrastive DPO reward with a normal and a reverse DPO model. Moreover, to avoid under- and over-optimization on different tokens, we design a token adaptive logit extrapolation mechanism to construct an appropriate teacher distribution for each token. Experimental results demonstrate the superiority of our AlignDistil over existing methods and showcase fast convergence due to its token-level distributional reward optimization.

* 15 pages, 2 figures

Via

Access Paper or Ask Questions

Dual-Space Knowledge Distillation for Large Language Models

Jun 25, 2024

Songming Zhang, Xue Zhang, Zengkui Sun, Yufeng Chen, Jinan Xu

Figure 1 for Dual-Space Knowledge Distillation for Large Language Models

Figure 2 for Dual-Space Knowledge Distillation for Large Language Models

Figure 3 for Dual-Space Knowledge Distillation for Large Language Models

Figure 4 for Dual-Space Knowledge Distillation for Large Language Models

Abstract:Knowledge distillation (KD) is known as a promising solution to compress large language models (LLMs) via transferring their knowledge to smaller models. During this process, white-box KD methods usually minimize the distance between the output distributions of the two models so that more knowledge can be transferred. However, in the current white-box KD framework, the output distributions are from the respective output spaces of the two models, using their own prediction heads. We argue that the space discrepancy will lead to low similarity between the teacher model and the student model on both representation and distribution levels. Furthermore, this discrepancy also hinders the KD process between models with different vocabularies, which is common for current LLMs. To address these issues, we propose a dual-space knowledge distillation (DSKD) framework that unifies the output spaces of the two models for KD. On the basis of DSKD, we further develop a cross-model attention mechanism, which can automatically align the representations of the two models with different vocabularies. Thus, our framework is not only compatible with various distance functions for KD (e.g., KL divergence) like the current framework, but also supports KD between any two LLMs regardless of their vocabularies. Experiments on task-agnostic instruction-following benchmarks show that DSKD significantly outperforms the current white-box KD framework with various distance functions, and also surpasses existing KD methods for LLMs with different vocabularies.

* 17 pages, 11 figures, code available at: https://github.com/songmzhang/DSKD

Via

Access Paper or Ask Questions