Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuai Ma

Modeling and Performance Analysis for Semantic Communications Based on Empirical Results

Apr 29, 2025

Shuai Ma, Bin Shen, Chuanhui Zhang, Youlong Wu, Hang Li, Shiyin Li, Guangming Shi, Naofal Al-Dhahir

Abstract:Due to the black-box characteristics of deep learning based semantic encoders and decoders, finding a tractable method for the performance analysis of semantic communications is a challenging problem. In this paper, we propose an Alpha-Beta-Gamma (ABG) formula to model the relationship between the end-to-end measurement and SNR, which can be applied for both image reconstruction tasks and inference tasks. Specifically, for image reconstruction tasks, the proposed ABG formula can well fit the commonly used DL networks, such as SCUNet, and Vision Transformer, for semantic encoding with the multi scale-structural similarity index measure (MS-SSIM) measurement. Furthermore, we find that the upper bound of the MS-SSIM depends on the number of quantized output bits of semantic encoders, and we also propose a closed-form expression to fit the relationship between the MS-SSIM and quantized output bits. To the best of our knowledge, this is the first theoretical expression between end-to-end performance metrics and SNR for semantic communications. Based on the proposed ABG formula, we investigate an adaptive power control scheme for semantic communications over random fading channels, which can effectively guarantee quality of service (QoS) for semantic communications, and then design the optimal power allocation scheme to maximize the energy efficiency of the semantic communication system. Furthermore, by exploiting the bisection algorithm, we develop the power allocation scheme to maximize the minimum QoS of multiple users for OFDMA downlink semantic communication Extensive simulations verify the effectiveness and superiority of the proposed ABG formula and power allocation schemes.

Via

Access Paper or Ask Questions

Task-Oriented Semantic Communication with Importance-Aware Rate Control

Apr 29, 2025

Zhiye Sun, Shuai Ma, Shiyin Li

Abstract:Semantic communication is recognized for its high compression efficiency and robust resistance to noise. However, utilizing a fixed transmission rate in environments with dynamic signal-to-noise ratios (SNR) often results in inefficient use of communication resources. To address this challenge, this letter proposes an importance-aware rate control semantic communication (IRCSC) scheme, which dynamically adjusts transmission rates in response to both channel conditions and semantic importance. The scheme employs a contribution-based importance analyzer to rank semantic importance. Additionaly, a novel metric, the semantic transmission integrity index (STII), is proposed to quantify the amount of correctly transmitted information and to correlate it with inference performance. Simulations indicate that, with low computational complexity, IRCSC guarantees a controllable trade-off between performance and rate, delivering higher compression efficiency and improved task performance in high-SNR scenarios.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions

Leveraging Labelled Data Knowledge: A Cooperative Rectification Learning Network for Semi-supervised 3D Medical Image Segmentation

Feb 17, 2025

Yanyan Wang, Kechen Song, Yuyuan Liu, Shuai Ma, Yunhui Yan, Gustavo Carneiro

Abstract:Semi-supervised 3D medical image segmentation aims to achieve accurate segmentation using few labelled data and numerous unlabelled data. The main challenge in the design of semi-supervised learning methods consists in the effective use of the unlabelled data for training. A promising solution consists of ensuring consistent predictions across different views of the data, where the efficacy of this strategy depends on the accuracy of the pseudo-labels generated by the model for this consistency learning strategy. In this paper, we introduce a new methodology to produce high-quality pseudo-labels for a consistency learning strategy to address semi-supervised 3D medical image segmentation. The methodology has three important contributions. The first contribution is the Cooperative Rectification Learning Network (CRLN) that learns multiple prototypes per class to be used as external knowledge priors to adaptively rectify pseudo-labels at the voxel level. The second contribution consists of the Dynamic Interaction Module (DIM) to facilitate pairwise and cross-class interactions between prototypes and multi-resolution image features, enabling the production of accurate voxel-level clues for pseudo-label rectification. The third contribution is the Cooperative Positive Supervision (CPS), which optimises uncertain representations to align with unassertive representations of their class distributions, improving the model's accuracy in classifying uncertain regions. Extensive experiments on three public 3D medical segmentation datasets demonstrate the effectiveness and superiority of our semi-supervised learning method.

* Medical Image Analysis

Via

Access Paper or Ask Questions

Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models

Jan 28, 2025

Huijie Liu, Jingyun Wang, Shuai Ma, Jie Hu, Xiaoming Wei, Guoliang Kang

Abstract:Motion customization aims to adapt the diffusion model (DM) to generate videos with the motion specified by a set of video clips with the same motion concept. To realize this goal, the adaptation of DM should be possible to model the specified motion concept, without compromising the ability to generate diverse appearances. Thus, the key to solving this problem lies in how to separate the motion concept from the appearance in the adaptation process of DM. Typical previous works explore different ways to represent and insert a motion concept into large-scale pretrained text-to-video diffusion models, e.g., learning a motion LoRA, using latent noise residuals, etc. While those methods can encode the motion concept, they also inevitably encode the appearance in the reference videos, resulting in weakened appearance generation capability. In this paper, we follow the typical way to learn a motion LoRA to encode the motion concept, but propose two novel strategies to enhance motion-appearance separation, including temporal attention purification (TAP) and appearance highway (AH). Specifically, we assume that in the temporal attention module, the pretrained Value embeddings are sufficient to serve as basic components needed by producing a new motion. Thus, in TAP, we choose only to reshape the temporal attention with motion LoRAs so that Value embeddings can be reorganized to produce a new motion. Further, in AH, we alter the starting point of each skip connection in U-Net from the output of each temporal attention module to the output of each spatial attention module. Extensive experiments demonstrate that compared to previous works, our method can generate videos with appearance more aligned with the text descriptions and motion more consistent with the reference videos.

* 8 pages,6 figures

Via

Access Paper or Ask Questions

TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning

Jan 09, 2025

Runhua Xu, Bo Li, Chao Li, James B. D. Joshi, Shuai Ma, Jianxin Li

Figure 1 for TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning

Figure 2 for TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning

Figure 3 for TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning

Figure 4 for TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning

Abstract:Federated learning is a computing paradigm that enhances privacy by enabling multiple parties to collaboratively train a machine learning model without revealing personal data. However, current research indicates that traditional federated learning platforms are unable to ensure privacy due to privacy leaks caused by the interchange of gradients. To achieve privacy-preserving federated learning, integrating secure aggregation mechanisms is essential. Unfortunately, existing solutions are vulnerable to recently demonstrated inference attacks such as the disaggregation attack. This paper proposes TAPFed, an approach for achieving privacy-preserving federated learning in the context of multiple decentralized aggregators with malicious actors. TAPFed uses a proposed threshold functional encryption scheme and allows for a certain number of malicious aggregators while maintaining security and privacy. We provide formal security and privacy analyses of TAPFed and compare it to various baselines through experimental evaluation. Our results show that TAPFed offers equivalent performance in terms of model quality compared to state-of-the-art approaches while reducing transmission overhead by 29%-45% across different model training scenarios. Most importantly, TAPFed can defend against recently demonstrated inference attacks caused by curious aggregators, which the majority of existing approaches are susceptible to.

* in IEEE Transactions on Dependable and Secure Computing, vol. 21, no. 5, pp. 4309-4323, Sept.-Oct. 2024
* The paper has been published in IEEE TDSC

Via

Access Paper or Ask Questions

LLMs are Also Effective Embedding Models: An In-depth Overview

Dec 17, 2024

Chongyang Tao, Tao Shen, Shen Gao, Junshuo Zhang, Zhen Li, Zhengwei Tao, Shuai Ma

Abstract:Large language models (LLMs) have revolutionized natural language processing by achieving state-of-the-art performance across various tasks. Recently, their effectiveness as embedding models has gained attention, marking a paradigm shift from traditional encoder-only models like ELMo and BERT to decoder-only, large-scale LLMs such as GPT, LLaMA, and Mistral. This survey provides an in-depth overview of this transition, beginning with foundational techniques before the LLM era, followed by LLM-based embedding models through two main strategies to derive embeddings from LLMs. 1) Direct prompting: We mainly discuss the prompt designs and the underlying rationale for deriving competitive embeddings. 2) Data-centric tuning: We cover extensive aspects that affect tuning an embedding model, including model architecture, training objectives, data constructions, etc. Upon the above, we also cover advanced methods, such as handling longer texts, and multilingual and cross-modal data. Furthermore, we discuss factors affecting choices of embedding models, such as performance/efficiency comparisons, dense vs sparse embeddings, pooling strategies, and scaling law. Lastly, the survey highlights the limitations and challenges in adapting LLMs for embeddings, including cross-task embedding quality, trade-offs between efficiency and accuracy, low-resource, long-context, data bias, robustness, etc. This survey serves as a valuable resource for researchers and practitioners by synthesizing current advancements, highlighting key challenges, and offering a comprehensive framework for future work aimed at enhancing the effectiveness and efficiency of LLMs as embedding models.

* 32 pages

Via

Access Paper or Ask Questions

Robust Max-Min Fair Beamforming Design for Rate Splitting Multiple Access-aided Visible Light Communications

Nov 26, 2024

Zhengqing Qiu, Yijie Mao, Shuai Ma, Bruno Clerckx

Abstract:This paper addresses the robust beamforming design for rate splitting multiple access (RSMA)-aided visible light communication (VLC) networks with imperfect channel state information at the transmitter (CSIT). In particular, we first derive the theoretical lower bound for the channel capacity of RSMA-aided VLC networks.Then we investigate the beamforming design to solve the max-min fairness (MMF) problem of RSMA-aided VLC networks under the practical optical power constraint and electrical power constraint while considering the practical imperfect CSIT scenario.To address the problem, we propose a constrained-concave-convex programming (CCCP)-based beamforming design algorithm which exploits semidefinite relaxation (SDR) technique and a penalty method to deal with the rank-one constraint caused by SDR.Numerical results show that the proposed robust beamforming design algorithm for RSMA-aided VLC network achieves a superior performance over the existing ones for space-division multiple access (SDMA) and non-orthogonal multiple access (NOMA).

Via

Access Paper or Ask Questions

Semantic Feature Division Multiple Access for Multi-user Digital Interference Networks

Jul 11, 2024

Shuai Ma, Chuanhui Zhang, Bin Shen, Youlong Wu, Hang Li, Shiyin Li, Guangming Shi, Naofal Al-Dhahir

Abstract:With the ever-increasing user density and quality of service (QoS) demand,5G networks with limited spectrum resources are facing massive access challenges. To address these challenges, in this paper, we propose a novel discrete semantic feature division multiple access (SFDMA) paradigm for multi-user digital interference networks. Specifically, by utilizing deep learning technology, SFDMA extracts multi-user semantic information into discrete representations in distinguishable semantic subspaces, which enables multiple users to transmit simultaneously over the same time-frequency resources. Furthermore, based on a robust information bottleneck, we design a SFDMA based multi-user digital semantic interference network for inference tasks, which can achieve approximate orthogonal transmission. Moreover, we propose a SFDMA based multi-user digital semantic interference network for image reconstruction tasks, where the discrete outputs of the semantic encoders of the users are approximately orthogonal, which significantly reduces multi-user interference. Furthermore, we propose an Alpha-Beta-Gamma (ABG) formula for semantic communications, which is the first theoretical relationship between inference accuracy and transmission power. Then, we derive adaptive power control methods with closed-form expressions for inference tasks. Extensive simulations verify the effectiveness and superiority of the proposed SFDMA.

Via

Access Paper or Ask Questions

Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Mar 25, 2024

Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, Xiaojuan Ma

Figure 1 for Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Figure 2 for Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Figure 3 for Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Figure 4 for Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Abstract:In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to promote human reflection and discussion on conflicting human-AI opinions in decision-making. Based on theories in human deliberation, this framework engages humans and AI in dimension-level opinion elicitation, deliberative discussion, and decision updates. To empower AI with deliberative capabilities, we designed Deliberative AI, which leverages large language models (LLMs) as a bridge between humans and domain-specific models to enable flexible conversational interactions and faithful information provision. An exploratory evaluation on a graduate admissions task shows that Deliberative AI outperforms conventional explainable AI (XAI) assistants in improving humans' appropriate reliance and task performance. Based on a mixed-methods analysis of participant behavior, perception, user experience, and open-ended feedback, we draw implications for future AI-assisted decision tool design.

Via

Access Paper or Ask Questions

Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making

Mar 04, 2024

Shuai Ma, Chenyi Zhang, Xinru Wang, Xiaojuan Ma, Ming Yin

Figure 1 for Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making

Figure 2 for Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making

Figure 3 for Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making

Figure 4 for Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making

Abstract:Artificial Intelligence (AI) is increasingly employed in various decision-making tasks, typically as a Recommender, providing recommendations that the AI deems correct. However, recent studies suggest this may diminish human analytical thinking and lead to humans' inappropriate reliance on AI, impairing the synergy in human-AI teams. In contrast, human advisors in group decision-making perform various roles, such as analyzing alternative options or criticizing decision-makers to encourage their critical thinking. This diversity of roles has not yet been empirically explored in AI assistance. In this paper, we examine three AI roles: Recommender, Analyzer, and Devil's Advocate, and evaluate their effects across two AI performance levels. Our results show each role's distinct strengths and limitations in task performance, reliance appropriateness, and user experience. Notably, the Recommender role is not always the most effective, especially if the AI performance level is low, the Analyzer role may be preferable. These insights offer valuable implications for designing AI assistants with adaptive functional roles according to different situations.

Via

Access Paper or Ask Questions