Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shen Gao

Explainable Deep Learning Based Adversarial Defense for Automatic Modulation Classification

Sep 19, 2025

Peihao Dong, Jingchun Wang, Shen Gao, Fuhui Zhou, Qihui Wu

Abstract:Deep learning (DL) has been widely applied to enhance automatic modulation classification (AMC). However, the elaborate AMC neural networks are susceptible to various adversarial attacks, which are challenging to handle due to the generalization capability and computational cost. In this article, an explainable DL based defense scheme, called SHapley Additive exPlanation enhanced Adversarial Fine-Tuning (SHAP-AFT), is developed in the perspective of disclosing the attacking impact on the AMC network. By introducing the concept of cognitive negative information, the motivation of using SHAP for defense is theoretically analyzed first. The proposed scheme includes three stages, i.e., the attack detection, the information importance evaluation, and the AFT. The first stage indicates the existence of the attack. The second stage evaluates contributions of the received data and removes those data positions using negative Shapley values corresponding to the dominating negative information caused by the attack. Then the AMC network is fine-tuned based on adversarial adaptation samples using the refined received data pattern. Simulation results show the effectiveness of the Shapley value as the key indicator as well as the superior defense performance of the proposed SHAP-AFT scheme in face of different attack types and intensities.

* Accepted by IEEE Internet of Things Journal

Via

Access Paper or Ask Questions

CESRec: Constructing Pseudo Interactions for Sequential Recommendation via Conversational Feedback

Sep 11, 2025

Yifan Wang, Shen Gao, Jiabao Fang, Rui Yan, Billy Chiu, Shuo Shang

Abstract:Sequential Recommendation Systems (SRS) have become essential in many real-world applications. However, existing SRS methods often rely on collaborative filtering signals and fail to capture real-time user preferences, while Conversational Recommendation Systems (CRS) excel at eliciting immediate interests through natural language interactions but neglect historical behavior. To bridge this gap, we propose CESRec, a novel framework that integrates the long-term preference modeling of SRS with the real-time preference elicitation of CRS. We introduce semantic-based pseudo interaction construction, which dynamically updates users'historical interaction sequences by analyzing conversational feedback, generating a pseudo-interaction sequence that seamlessly combines long-term and real-time preferences. Additionally, we reduce the impact of outliers in historical items that deviate from users'core preferences by proposing dual alignment outlier items masking, which identifies and masks such items using semantic-collaborative aligned representations. Extensive experiments demonstrate that CESRec achieves state-of-the-art performance by boosting strong SRS models, validating its effectiveness in integrating conversational feedback into SRS.

Via

Access Paper or Ask Questions

TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation

May 26, 2025

Chengrui Huang, Shen Gao, Zhengliang Shi, Dongsheng Wang, Shuo Shang

Abstract:Existing tool-learning methods usually rely on supervised fine-tuning, they often overlook fine-grained optimization of internal tool call details, leading to limitations in preference alignment and error discrimination. To overcome these challenges, we propose Token-level Tool-use Preference Alignment Training Framework (TTPA), a training paradigm for constructing token-level tool-use preference datasets that align LLMs with fine-grained preferences using a novel error-oriented scoring mechanism. TTPA first introduces reversed dataset construction, a method for creating high-quality, multi-turn tool-use datasets by reversing the generation flow. Additionally, we propose Token-level Preference Sampling (TPS) to capture fine-grained preferences by modeling token-level differences during generation. To address biases in scoring, we introduce the Error-oriented Scoring Mechanism (ESM), which quantifies tool-call errors and can be used as a training signal. Extensive experiments on three diverse benchmark datasets demonstrate that TTPA significantly improves tool-using performance while showing strong generalization ability across models and datasets.

* 16 pages, 5 figures

Via

Access Paper or Ask Questions

CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis

May 26, 2025

Ruixiang Feng, Shen Gao, Xiuying Chen, Lisi Chen, Shuo Shang

Figure 1 for CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis

Figure 2 for CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis

Figure 3 for CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis

Figure 4 for CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis

Abstract:Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they often exhibit a specific cultural biases, neglecting the values and linguistic diversity of low-resource regions. This cultural bias not only undermines universal equality, but also risks reinforcing stereotypes and perpetuating discrimination. To address this, we propose CulFiT, a novel culturally-aware training paradigm that leverages multilingual data and fine-grained reward modeling to enhance cultural sensitivity and inclusivity. Our approach synthesizes diverse cultural-related questions, constructs critique data in culturally relevant languages, and employs fine-grained rewards to decompose cultural texts into verifiable knowledge units for interpretable evaluation. We also introduce GlobalCultureQA, a multilingual open-ended question-answering dataset designed to evaluate culturally-aware responses in a global context. Extensive experiments on three existing benchmarks and our GlobalCultureQA demonstrate that CulFiT achieves state-of-the-art open-source model performance in cultural alignment and general reasoning.

Via

Access Paper or Ask Questions

Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs

Feb 18, 2025

Zixiao Wang, Duzhen Zhang, Ishita Agrawal, Shen Gao, Le Song, Xiuying Chen

Abstract:Previous approaches to persona simulation large language models (LLMs) have typically relied on learning basic biographical information, or using limited role-play dialogue datasets to capture a character's responses. However, a holistic representation of an individual goes beyond surface-level facts or conversations to deeper thoughts and thinking. In this work, we introduce CharacterBot, a model designed to replicate both the linguistic patterns and distinctive thought processes of a character. Using Lu Xun, a renowned Chinese writer, as a case study, we propose four training tasks derived from his 17 essay collections. These include a pre-training task focused on mastering external linguistic structures and knowledge, as well as three fine-tuning tasks: multiple-choice question answering, generative question answering, and style transfer, each aligning the LLM with Lu Xun's internal ideation and writing style. To optimize learning across these tasks, we introduce a CharLoRA parameter updating mechanism, where a general linguistic style expert collaborates with other task-specific experts to better study both the language style and the understanding of deeper thoughts. We evaluate CharacterBot on three tasks for linguistic accuracy and opinion comprehension, demonstrating that it significantly outperforms the baselines on our adapted metrics. We hope that this work inspires future research on deep character persona simulation LLM.

* 19 pages, 3 figures

Via

Access Paper or Ask Questions

LLMs are Also Effective Embedding Models: An In-depth Overview

Dec 17, 2024

Chongyang Tao, Tao Shen, Shen Gao, Junshuo Zhang, Zhen Li, Zhengwei Tao, Shuai Ma

Figure 1 for LLMs are Also Effective Embedding Models: An In-depth Overview

Figure 2 for LLMs are Also Effective Embedding Models: An In-depth Overview

Figure 3 for LLMs are Also Effective Embedding Models: An In-depth Overview

Figure 4 for LLMs are Also Effective Embedding Models: An In-depth Overview

Abstract:Large language models (LLMs) have revolutionized natural language processing by achieving state-of-the-art performance across various tasks. Recently, their effectiveness as embedding models has gained attention, marking a paradigm shift from traditional encoder-only models like ELMo and BERT to decoder-only, large-scale LLMs such as GPT, LLaMA, and Mistral. This survey provides an in-depth overview of this transition, beginning with foundational techniques before the LLM era, followed by LLM-based embedding models through two main strategies to derive embeddings from LLMs. 1) Direct prompting: We mainly discuss the prompt designs and the underlying rationale for deriving competitive embeddings. 2) Data-centric tuning: We cover extensive aspects that affect tuning an embedding model, including model architecture, training objectives, data constructions, etc. Upon the above, we also cover advanced methods, such as handling longer texts, and multilingual and cross-modal data. Furthermore, we discuss factors affecting choices of embedding models, such as performance/efficiency comparisons, dense vs sparse embeddings, pooling strategies, and scaling law. Lastly, the survey highlights the limitations and challenges in adapting LLMs for embeddings, including cross-task embedding quality, trade-offs between efficiency and accuracy, low-resource, long-context, data bias, robustness, etc. This survey serves as a valuable resource for researchers and practitioners by synthesizing current advancements, highlighting key challenges, and offering a comprehensive framework for future work aimed at enhancing the effectiveness and efficiency of LLMs as embedding models.

* 32 pages

Via

Access Paper or Ask Questions

Pruned Convolutional Attention Network Based Wideband Spectrum Sensing with Sub-Nyquist Sampling

Nov 30, 2024

Peihao Dong, Jibin Jia, Shen Gao, Fuhui Zhou, Qihui Wu

Figure 1 for Pruned Convolutional Attention Network Based Wideband Spectrum Sensing with Sub-Nyquist Sampling

Figure 2 for Pruned Convolutional Attention Network Based Wideband Spectrum Sensing with Sub-Nyquist Sampling

Figure 3 for Pruned Convolutional Attention Network Based Wideband Spectrum Sensing with Sub-Nyquist Sampling

Figure 4 for Pruned Convolutional Attention Network Based Wideband Spectrum Sensing with Sub-Nyquist Sampling

Abstract:Wideband spectrum sensing (WSS) is critical for orchestrating multitudinous wireless transmissions via spectrum sharing, but may incur excessive costs of hardware, power and computation due to the high sampling rate. In this article, a deep learning based WSS framework embedding the multicoset preprocessing is proposed to enable the low-cost sub-Nyquist sampling. A pruned convolutional attention WSS network (PCA-WSSNet) is designed to organically integrate the multicoset preprocessing and the convolutional attention mechanism as well as to reduce the model complexity remarkably via the selective weight pruning without the performance loss. Furthermore, a transfer learning (TL) strategy benefiting from the model pruning is developed to improve the robustness of PCA-WSSNet with few adaptation samples of new scenarios. Simulation results show the performance superiority of PCA-WSSNet over the state of the art. Compared with direct TL, the pruned TL strategy can simultaneously improve the prediction accuracy in unseen scenarios, reduce the model size, and accelerate the model inference.

* Accepted by IEEE Transactions on Vehicular Technology

Via

Access Paper or Ask Questions

Edge Learning Based Collaborative Automatic Modulation Classification for Hierarchical Cognitive Radio Networks

Jul 30, 2024

Peihao Dong, Chaowei He, Shen Gao, Fuhui Zhou, Qihui Wu

Abstract:In hierarchical cognitive radio networks, edge or cloud servers utilize the data collected by edge devices for modulation classification, which, however, is faced with problems of the computation load, transmission overhead, and data privacy. In this article, an edge learning (EL) based framework jointly mobilizing the edge device and the edge server for intelligent co-inference is proposed to realize the collaborative automatic modulation classification (C-AMC) between them. A spectrum semantic compression neural network is designed for the edge device to compress the collected raw data into a compact semantic embedding that is then sent to the edge server via the wireless channel. On the edge server side, a modulation classification neural network combining the bidirectional long-short term memory and attention structures is elaborated to determine the modulation type from the noisy semantic embedding. The C-AMC framework decently balances the computation resources of both sides while avoiding the high transmission overhead and data privacy leakage. Both the offline and online training procedures of the C-AMC framework are elaborated. The compression strategy of the C-AMC framework is also developed to further facilitate the deployment, especially for the resource-constrained edge device. Simulation results show the superiority of the EL-based C-AMC framework in terms of the classification accuracy, computational complexity, and the data compression rate as well as reveal useful insights paving the practical implementation.

* Accepted by IEEE Internet of Things Journal

Via

Access Paper or Ask Questions

What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

Jul 03, 2024

Chengrui Huang, Zhengliang Shi, Yuntao Wen, Xiuying Chen, Peng Han, Shen Gao, Shuo Shang

Figure 1 for What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

Figure 2 for What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

Figure 3 for What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

Figure 4 for What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

Abstract:Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications. Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, and algorithms. Without understanding the impact of these factors, it can lead to inconsistent results, inefficient model deployment, and suboptimal tool utilization, ultimately hindering the practical integration and scalability of LLMs in real-world scenarios. Therefore, in this paper, we explore the impact of both internal and external factors on the performance of tool learning frameworks. Through extensive experiments on two benchmark datasets, we find several insightful conclusions for future work, including the observation that LLMs can benefit significantly from increased trial and exploration. We believe our empirical study provides a new perspective for future tool learning research.

* 19 pages, 9 figures

Via

Access Paper or Ask Questions

Simulating Financial Market via Large Language Model based Agents

Jun 28, 2024

Shen Gao, Yuntao Wen, Minghang Zhu, Jianing Wei, Yuhan Cheng, Qunzi Zhang, Shuo Shang

Figure 1 for Simulating Financial Market via Large Language Model based Agents

Figure 2 for Simulating Financial Market via Large Language Model based Agents

Figure 3 for Simulating Financial Market via Large Language Model based Agents

Figure 4 for Simulating Financial Market via Large Language Model based Agents

Abstract:Most economic theories typically assume that financial market participants are fully rational individuals and use mathematical models to simulate human behavior in financial markets. However, human behavior is often not entirely rational and is challenging to predict accurately with mathematical models. In this paper, we propose \textbf{A}gent-based \textbf{S}imulated \textbf{F}inancial \textbf{M}arket (ASFM), which first constructs a simulated stock market with a real order matching system. Then, we propose a large language model based agent as the stock trader, which contains the profile, observation, and tool-learning based action module. The trading agent can comprehensively understand current market dynamics and financial policy information, and make decisions that align with their trading strategy. In the experiments, we first verify that the reactions of our ASFM are consistent with the real stock market in two controllable scenarios. In addition, we also conduct experiments in two popular economics research directions, and we find that conclusions drawn in our \model align with the preliminary findings in economics research. Based on these observations, we believe our proposed ASFM provides a new paradigm for economic research.

Via

Access Paper or Ask Questions