Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuanyu Liu

DSConv: Dynamic Splitting Convolution for Pansharpening

Aug 08, 2025

Xuanyu Liu, Bonan An

Abstract:Aiming to obtain a high-resolution image, pansharpening involves the fusion of a multi-spectral image (MS) and a panchromatic image (PAN), the low-level vision task remaining significant and challenging in contemporary research. Most existing approaches rely predominantly on standard convolutions, few making the effort to adaptive convolutions, which are effective owing to the inter-pixel correlations of remote sensing images. In this paper, we propose a novel strategy for dynamically splitting convolution kernels in conjunction with attention, selecting positions of interest, and splitting the original convolution kernel into multiple smaller kernels, named DSConv. The proposed DSConv more effectively extracts features of different positions within the receptive field, enhancing the network's generalization, optimization, and feature representation capabilities. Furthermore, we innovate and enrich concepts of dynamic splitting convolution and provide a novel network architecture for pansharpening capable of achieving the tasks more efficiently, building upon this methodology. Adequate fair experiments illustrate the effectiveness and the state-of-the-art performance attained by DSConv.Comprehensive and rigorous discussions proved the superiority and optimal usage conditions of DSConv.

Via

Access Paper or Ask Questions

Foundation Model Empowered Synesthesia of Machines (SoM): AI-native Intelligent Multi-Modal Sensing-Communication Integration

Jun 09, 2025

Xiang Cheng, Boxun Liu, Xuanyu Liu, Ensong Liu, Ziwei Huang

Abstract:To support future intelligent multifunctional sixth-generation (6G) wireless communication networks, Synesthesia of Machines (SoM) is proposed as a novel paradigm for artificial intelligence (AI)-native intelligent multi-modal sensing-communication integration. However, existing SoM system designs rely on task-specific AI models and face challenges such as scarcity of massive high-quality datasets, constrained modeling capability, poor generalization, and limited universality. Recently, foundation models (FMs) have emerged as a new deep learning paradigm and have been preliminarily applied to SoM-related tasks, but a systematic design framework is still lacking. In this paper, we for the first time present a systematic categorization of FMs for SoM system design, dividing them into general-purpose FMs, specifically large language models (LLMs), and SoM domain-specific FMs, referred to as wireless foundation models. Furthermore, we derive key characteristics of FMs in addressing existing challenges in SoM systems and propose two corresponding roadmaps, i.e., LLM-based and wireless foundation model-based design. For each roadmap, we provide a framework containing key design steps as a guiding pipeline and several representative case studies of FM-empowered SoM system design. Specifically, we propose LLM-based path loss generation (LLM4PG) and scatterer generation (LLM4SG) schemes, and wireless channel foundation model (WiCo) for SoM mechanism exploration, LLM-based wireless multi-task SoM transceiver (LLM4WM) and wireless foundation model (WiFo) for SoM-enhanced transceiver design, and wireless cooperative perception foundation model (WiPo) for SoM-enhanced cooperative perception, demonstrating the significant superiority of FMs over task-specific models. Finally, we summarize and highlight potential directions for future research.

Via

Access Paper or Ask Questions

Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Apr 04, 2025

Xuanyu Liu, Huiyun Yao, Jinggui Gao, Zhongyi Guo, Xue Zhang, Yulin Dong

Figure 1 for Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Figure 2 for Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Figure 3 for Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Figure 4 for Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Abstract:Background:Convolutional Neural Networks(CNN) and Vision Transformers(ViT) are the main techniques used in Medical image segmentation. However, CNN is limited to local contextual information, and ViT's quadratic complexity results in significant computational costs. At the same time, equipping the model to distinguish lesion boundaries with varying degrees of severity is also a challenge encountered in skin lesion segmentation. Purpose:This research aims to optimize the balance between computational costs and long-range dependency modelling and achieve excellent generalization across lesions with different degrees of severity. Methods:we propose a lightweight U-shape network that utilizes Vision Fastformer with Fusion Mechanism (VFFM-UNet). We inherit the advantages of Fastformer's additive attention mechanism, combining element-wise product and matrix product for comprehensive feature extraction and channel reduction to save computational costs. In order to accurately identify the lesion boundaries with varying degrees of severity, we designed Fusion Mechanism including Multi-Granularity Fusion and Channel Fusion, which can process the feature maps in the granularity and channel levels to obtain different contextual information. Results:Comprehensive experiments on the ISIC2017, ISIC2018 and PH2 datasets demonstrate that VFFM-UNet outperforms existing state-of-the-art models regarding parameter numbers, computational complexity and segmentation performance. In short, compared to MISSFormer, our model achieves superior segmentation performance while reducing parameter and computation costs by 101x and 15x, respectively. Conclusions:Both quantitative and qualitative analyses show that VFFM-UNet sets a new benchmark by reaching an ideal balance between parameter numbers, computational complexity, and segmentation performance compared to existing state-of-the-art models.

Via

Access Paper or Ask Questions

Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference

Jan 27, 2025

Yinghan Li, Yifei Li, Jiejing Zhang, Bujiao Chen, Xiaotong Chen, Lian Duan, Yejun Jin, Zheng Li, Xuanyu Liu, Haoyu Wang(+6 more)

Figure 1 for Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference

Abstract:It has long been a problem to arrange and execute irregular workloads on massively parallel devices. We propose a general framework for statically batching irregular workloads into a single kernel with a runtime task mapping mechanism on GPUs. We further apply this framework to Mixture-of-Experts (MoE) model inference and implement an optimized and efficient CUDA kernel. Our MoE kernel achieves up to 91% of the peak Tensor Core throughput on NVIDIA H800 GPU and 95% on NVIDIA H20 GPU.

* 11 pages

Via

Access Paper or Ask Questions

WiFo: Wireless Foundation Model for Channel Prediction

Dec 12, 2024

Boxun Liu, Shijian Gao, Xuanyu Liu, Xiang Cheng, Liuqing Yang

Figure 1 for WiFo: Wireless Foundation Model for Channel Prediction

Figure 2 for WiFo: Wireless Foundation Model for Channel Prediction

Figure 3 for WiFo: Wireless Foundation Model for Channel Prediction

Figure 4 for WiFo: Wireless Foundation Model for Channel Prediction

Abstract:Channel prediction permits to acquire channel state information (CSI) without signaling overhead. However, almost all existing channel prediction methods necessitate the deployment of a dedicated model to accommodate a specific configuration. Leveraging the powerful modeling and multi-task learning capabilities of foundation models, we propose the first space-time-frequency (STF) wireless foundation model (WiFo) to address time-frequency channel prediction tasks in a one-for-all manner. Specifically, WiFo is initially pre-trained over massive and extensive diverse CSI datasets. Then, the model will be instantly used for channel prediction under various CSI configurations without any fine-tuning. We propose a masked autoencoder (MAE)-based network structure for WiFo to handle heterogeneous STF CSI data, and design several mask reconstruction tasks for self-supervised pre-training to capture the inherent 3D variations of CSI. To fully unleash its predictive power, we build a large-scale heterogeneous simulated CSI dataset consisting of 160K CSI samples for pre-training. Simulations validate its superior unified learning performance across multiple datasets and demonstrate its state-of-the-art (SOTA) zero-shot generalization performance via comparisons with other full-shot baselines.

Via

Access Paper or Ask Questions

Atrial Fibrillation Detection System via Acoustic Sensing for Mobile Phones

Oct 28, 2024

Xuanyu Liu, Jiao Li, Haoxian Liu, Zongqi Yang, Yi Huang, Jin Zhang

Figure 1 for Atrial Fibrillation Detection System via Acoustic Sensing for Mobile Phones

Figure 2 for Atrial Fibrillation Detection System via Acoustic Sensing for Mobile Phones

Figure 3 for Atrial Fibrillation Detection System via Acoustic Sensing for Mobile Phones

Figure 4 for Atrial Fibrillation Detection System via Acoustic Sensing for Mobile Phones

Abstract:Atrial fibrillation (AF) is characterized by irregular electrical impulses originating in the atria, which can lead to severe complications and even death. Due to the intermittent nature of the AF, early and timely monitoring of AF is critical for patients to prevent further exacerbation of the condition. Although ambulatory ECG Holter monitors provide accurate monitoring, the high cost of these devices hinders their wider adoption. Current mobile-based AF detection systems offer a portable solution, however, these systems have various applicability issues such as being easily affected by environmental factors and requiring significant user effort. To overcome the above limitations, we present MobileAF, a novel smartphone-based AF detection system using speakers and microphones. In order to capture minute cardiac activities, we propose a multi-channel pulse wave probing method. In addition, we enhance the signal quality by introducing a three-stage pulse wave purification pipeline. What's more, a ResNet-based network model is built to implement accurate and reliable AF detection. We collect data from 23 participants utilizing our data collection application on the smartphone. Extensive experimental results demonstrate the superior performance of our system, with 97.9% accuracy, 96.8% precision, 97.2% recall, 98.3% specificity, and 97.0% F1 score.

* This paper has been submitted to ACM Transactions on Sensor Networks (TOSN)

Via

Access Paper or Ask Questions

AcousAF: Acoustic Sensing-Based Atrial Fibrillation Detection System for Mobile Phones

Aug 09, 2024

Xuanyu Liu, Haoxian Liu, Jiao Li, Zongqi Yang, Yi Huang, Jin Zhang

Figure 1 for AcousAF: Acoustic Sensing-Based Atrial Fibrillation Detection System for Mobile Phones

Figure 2 for AcousAF: Acoustic Sensing-Based Atrial Fibrillation Detection System for Mobile Phones

Figure 3 for AcousAF: Acoustic Sensing-Based Atrial Fibrillation Detection System for Mobile Phones

Figure 4 for AcousAF: Acoustic Sensing-Based Atrial Fibrillation Detection System for Mobile Phones

Abstract:Atrial fibrillation (AF) is characterized by irregular electrical impulses originating in the atria, which can lead to severe complications and even death. Due to the intermittent nature of the AF, early and timely monitoring of AF is critical for patients to prevent further exacerbation of the condition. Although ambulatory ECG Holter monitors provide accurate monitoring, the high cost of these devices hinders their wider adoption. Current mobile-based AF detection systems offer a portable solution. However, these systems have various applicability issues, such as being easily affected by environmental factors and requiring significant user effort. To overcome the above limitations, we present AcousAF, a novel AF detection system based on acoustic sensors of smartphones. Particularly, we explore the potential of pulse wave acquisition from the wrist using smartphone speakers and microphones. In addition, we propose a well-designed framework comprised of pulse wave probing, pulse wave extraction, and AF detection to ensure accurate and reliable AF detection. We collect data from 20 participants utilizing our custom data collection application on the smartphone. Extensive experimental results demonstrate the high performance of our system, with 92.8% accuracy, 86.9% precision, 87.4% recall, and 87.1% F1 Score.

* Accepted for publication in Companion of the 2024 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp Companion '24)

Via

Access Paper or Ask Questions

LLM4CP: Adapting Large Language Models for Channel Prediction

Jun 20, 2024

Boxun Liu, Xuanyu Liu, Shijian Gao, Xiang Cheng, Liuqing Yang

Figure 1 for LLM4CP: Adapting Large Language Models for Channel Prediction

Figure 2 for LLM4CP: Adapting Large Language Models for Channel Prediction

Figure 3 for LLM4CP: Adapting Large Language Models for Channel Prediction

Figure 4 for LLM4CP: Adapting Large Language Models for Channel Prediction

Abstract:Channel prediction is an effective approach for reducing the feedback or estimation overhead in massive multi-input multi-output (m-MIMO) systems. However, existing channel prediction methods lack precision due to model mismatch errors or network generalization issues. Large language models (LLMs) have demonstrated powerful modeling and generalization abilities, and have been successfully applied to cross-modal tasks, including the time series analysis. Leveraging the expressive power of LLMs, we propose a pre-trained LLM-empowered channel prediction method (LLM4CP) to predict the future downlink channel state information (CSI) sequence based on the historical uplink CSI sequence. We fine-tune the network while freezing most of the parameters of the pre-trained LLM for better cross-modality knowledge transfer. To bridge the gap between the channel data and the feature space of the LLM, preprocessor, embedding, and output modules are specifically tailored by taking into account unique channel characteristics. Simulations validate that the proposed method achieves SOTA prediction performance on full-sample, few-shot, and generalization tests with low training and inference costs.

Via

Access Paper or Ask Questions

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

Mar 18, 2023

Lingting Zhu, Xian Liu, Xuanyu Liu, Rui Qian, Ziwei Liu, Lequan Yu

Abstract:Animating virtual avatars to make co-speech gestures facilitates various applications in human-machine interaction. The existing methods mainly rely on generative adversarial networks (GANs), which typically suffer from notorious mode collapse and unstable training, thus making it difficult to learn accurate audio-gesture joint distributions. In this work, we propose a novel diffusion-based framework, named Diffusion Co-Speech Gesture (DiffGesture), to effectively capture the cross-modal audio-to-gesture associations and preserve temporal coherence for high-fidelity audio-driven co-speech gesture generation. Specifically, we first establish the diffusion-conditional generation process on clips of skeleton sequences and audio to enable the whole framework. Then, a novel Diffusion Audio-Gesture Transformer is devised to better attend to the information from multiple modalities and model the long-term temporal dependency. Moreover, to eliminate temporal inconsistency, we propose an effective Diffusion Gesture Stabilizer with an annealed noise sampling strategy. Benefiting from the architectural advantages of diffusion models, we further incorporate implicit classifier-free guidance to trade off between diversity and gesture quality. Extensive experiments demonstrate that DiffGesture achieves state-of-theart performance, which renders coherent gestures with better mode coverage and stronger audio correlations. Code is available at https://github.com/Advocate99/DiffGesture.

* Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023. 10 pages, 3 figures

Via

Access Paper or Ask Questions

RepBNN: towards a precise Binary Neural Network with Enhanced Feature Map via Repeating

Jul 19, 2022

Xulong Shi, Zhi Qi, Jiaxuan Cai, Keqi Fu, Yaru Zhao, Zan Li, Xuanyu Liu, Hao Liu

Figure 1 for RepBNN: towards a precise Binary Neural Network with Enhanced Feature Map via Repeating

Figure 2 for RepBNN: towards a precise Binary Neural Network with Enhanced Feature Map via Repeating

Figure 3 for RepBNN: towards a precise Binary Neural Network with Enhanced Feature Map via Repeating

Figure 4 for RepBNN: towards a precise Binary Neural Network with Enhanced Feature Map via Repeating

Abstract:Binary neural network (BNN) is an extreme quantization version of convolutional neural networks (CNNs) with all features and weights mapped to just 1-bit. Although BNN saves a lot of memory and computation demand to make CNN applicable on edge or mobile devices, BNN suffers the drop of network performance due to the reduced representation capability after binarization. In this paper, we propose a new replaceable and easy-to-use convolution module RepConv, which enhances feature maps through replicating input or output along channel dimension by $\beta$ times without extra cost on the number of parameters and convolutional computation. We also define a set of RepTran rules to use RepConv throughout BNN modules like binary convolution, fully connected layer and batch normalization. Experiments demonstrate that after the RepTran transformation, a set of highly cited BNNs have achieved universally better performance than the original BNN versions. For example, the Top-1 accuracy of Rep-ReCU-ResNet-20, i.e., a RepBconv enhanced ReCU-ResNet-20, reaches 88.97% on CIFAR-10, which is 1.47% higher than that of the original network. And Rep-AdamBNN-ReActNet-A achieves 71.342% Top-1 accuracy on ImageNet, a fresh state-of-the-art result of BNNs. Code and models are available at:https://github.com/imfinethanks/Rep_AdamBNN.

* This paper has absolutely nothing to do with repvgg, rep means repeating

Via

Access Paper or Ask Questions