Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu Zhu

Heterogeneous-IRS-Assisted MIMO Systems: Channel Estimation and Beamforming

Jun 12, 2025

Weibiao Zhao, Qiucen Wu, Yuanqi Tang, Yu Zhu

Abstract:Intelligent reflecting surface (IRS) has gained great attention for its ability to create favorable propagation environments. However, the power consumption of conventional IRSs cannot be ignored due to the large number of reflecting elements and control circuits. To balance performance and power consumption, we previously proposed a heterogeneous-IRS (HE-IRS), a green IRS structure integrating dynamically tunable elements (DTEs) and statically tunable elements (STEs). Compared to conventional IRSs with only DTEs, the unique DTE-STE integrated structure introduces new challenges in both channel estimation and beamforming. In this paper, we investigate the channel estimation and beamforming problems in HE-IRS-assisted multi-user multiple-input multiple-output systems. Unlike the overall cascaded channel estimated in conventional IRSs, we show that the HE-IRS channel to be estimated is decomposed into a DTE-based cascaded channel and an STE-based equivalent channel. Leveraging it along with the inherent sparsity of DTE- and STE-based channels and manifold optimization, we propose an efficient channel estimation scheme. To address the rank mismatch problem in the imperfect channel sparsity information, a robust rank selection rule is developed. For beamforming, we propose an offline algorithm to optimize the STE phase shifts for wide beam coverage, and an online algorithm to optimize the BS precoder and the DTE phase shifts using the estimated HE-IRS channel. Simulation results show that the HE-IRS requires less pilot overhead than conventional IRSs with the same number of elements. With the proposed channel estimation and beamforming schemes, the green HE-IRS achieves competitive sum rate performance with significantly reduced power consumption.

* 30 pages, 8 figures

Via

Access Paper or Ask Questions

Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models

May 26, 2025

Yang Zhang, Yu Yu, Bo Tang, Yu Zhu, Chuxiong Sun, Wenqiang Wei, Jie Hu, Zipeng Xie, Zhiyu Li, Feiyu Xiong(+1 more)

Abstract:With the rapid development of Large Language Models (LLMs), aligning these models with human preferences and values is critical to ensuring ethical and safe applications. However, existing alignment techniques such as RLHF or DPO often require direct fine-tuning on LLMs with billions of parameters, resulting in substantial computational costs and inefficiencies. To address this, we propose Micro token-level Accept-Reject Aligning (MARA) approach designed to operate independently of the language models. MARA simplifies the alignment process by decomposing sentence-level preference learning into token-level binary classification, where a compact three-layer fully-connected network determines whether candidate tokens are "Accepted" or "Rejected" as part of the response. Extensive experiments across seven different LLMs and three open-source datasets show that MARA achieves significant improvements in alignment performance while reducing computational costs.

* Accepted to 34th International Joint Conference on Artificial Intelligence (IJCAI 2025)

Via

Access Paper or Ask Questions

Argus: Federated Non-convex Bilevel Learning over 6G Space-Air-Ground Integrated Network

May 14, 2025

Ya Liu, Kai Yang, Yu Zhu, Keying Yang, Haibo Zhao

Abstract:The space-air-ground integrated network (SAGIN) has recently emerged as a core element in the 6G networks. However, traditional centralized and synchronous optimization algorithms are unsuitable for SAGIN due to infrastructureless and time-varying environments. This paper aims to develop a novel Asynchronous algorithm a.k.a. Argus for tackling non-convex and non-smooth decentralized federated bilevel learning over SAGIN. The proposed algorithm allows networked agents (e.g. autonomous aerial vehicles) to tackle bilevel learning problems in time-varying networks asynchronously, thereby averting stragglers from impeding the overall training speed. We provide a theoretical analysis of the iteration complexity, communication complexity, and computational complexity of Argus. Its effectiveness is further demonstrated through numerical experiments.

* 17 pages, 11 figures

Via

Access Paper or Ask Questions

Bayesian Federated Cause-of-Death Classification and Quantification Under Distribution Shift

May 04, 2025

Yu Zhu, Zehang Richard Li

Abstract:In regions lacking medically certified causes of death, verbal autopsy (VA) is a critical and widely used tool to ascertain the cause of death through interviews with caregivers. Data collected by VAs are often analyzed using probabilistic algorithms. The performance of these algorithms often degrades due to distributional shift across populations. Most existing VA algorithms rely on centralized training, requiring full access to training data for joint modeling. This is often infeasible due to privacy and logistical constraints. In this paper, we propose a novel Bayesian Federated Learning (BFL) framework that avoids data sharing across multiple training sources. Our method enables reliable individual-level cause-of-death classification and population-level quantification of cause-specific mortality fractions (CSMFs), in a target domain with limited or no local labeled data. The proposed framework is modular, computationally efficient, and compatible with a wide range of existing VA algorithms as candidate models, facilitating flexible deployment in real-world mortality surveillance systems. We validate the performance of BFL through extensive experiments on two real-world VA datasets under varying levels of distribution shift. Our results show that BFL significantly outperforms the base models built on a single domain and achieves comparable or better performance compared to joint modeling.

* 11 figures

Via

Access Paper or Ask Questions

Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views

Apr 29, 2025

Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, Yanning Zhang

Abstract:We present a Gaussian Splatting method for surface reconstruction using sparse input views. Previous methods relying on dense views struggle with extremely sparse Structure-from-Motion points for initialization. While learning-based Multi-view Stereo (MVS) provides dense 3D points, directly combining it with Gaussian Splatting leads to suboptimal results due to the ill-posed nature of sparse-view geometric optimization. We propose Sparse2DGS, an MVS-initialized Gaussian Splatting pipeline for complete and accurate reconstruction. Our key insight is to incorporate the geometric-prioritized enhancement schemes, allowing for direct and robust geometric learning under ill-posed conditions. Sparse2DGS outperforms existing methods by notable margins while being ${2}\times$ faster than the NeRF-based fine-tuning approach.

* CVPR 2025

Via

Access Paper or Ask Questions

C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

Feb 27, 2025

Yuhao Li, Mirana Claire Angel, Salman Khan, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

Abstract:Trajectory-based motion control has emerged as an intuitive and efficient approach for controllable video generation. However, the existing trajectory-based approaches are usually limited to only generating the motion trajectory of the controlled object and ignoring the dynamic interactions between the controlled object and its surroundings. To address this limitation, we propose a Chain-of-Thought-based motion controller for controllable video generation, named C-Drag. Instead of directly generating the motion of some objects, our C-Drag first performs object perception and then reasons the dynamic interactions between different objects according to the given motion control of the objects. Specifically, our method includes an object perception module and a Chain-of-Thought-based motion reasoning module. The object perception module employs visual language models to capture the position and category information of various objects within the image. The Chain-of-Thought-based motion reasoning module takes this information as input and conducts a stage-wise reasoning process to generate motion trajectories for each of the affected objects, which are subsequently fed to the diffusion model for video synthesis. Furthermore, we introduce a new video object interaction (VOI) dataset to evaluate the generation quality of motion controlled video generation methods. Our VOI dataset contains three typical types of interactions and provides the motion trajectories of objects that can be used for accurate performance evaluation. Experimental results show that C-Drag achieves promising performance across multiple metrics, excelling in object motion control. Our benchmark, codes, and models will be available at https://github.com/WesLee88524/C-Drag-Official-Repo.

Via

Access Paper or Ask Questions

In-Network Preprocessing of Recommender Systems on Multi-Tenant SmartNICs

Jan 21, 2025

Yu Zhu, Wenqi Jiang, Gustavo Alonso

Abstract:Keeping ML-based recommender models up-to-date as data drifts and evolves is essential to maintain accuracy. As a result, online data preprocessing plays an increasingly important role in serving recommender systems. Existing solutions employ multiple CPU workers to saturate the input bandwidth of a single training node. Such an approach results in high deployment costs and energy consumption. For instance, a recent report from industrial deployments shows that data storage and ingestion pipelines can account for over 60\% of the power consumption in a recommender system. In this paper, we tackle the issue from a hardware perspective by introducing Piper, a flexible and network-attached accelerator that executes data loading and preprocessing pipelines in a streaming fashion. As part of the design, we define MiniPipe, the smallest pipeline unit enabling multi-pipeline implementation by executing various data preprocessing tasks across the single board, giving Piper the ability to be reconfigured at runtime. Our results, using publicly released commercial pipelines, show that Piper, prototyped on a power-efficient FPGA, achieves a 39$\sim$105$\times$ speedup over a server-grade, 128-core CPU and 3$\sim$17$\times$ speedup over GPUs like RTX 3090 and A100 in multiple pipelines. The experimental analysis demonstrates that Piper provides advantages in both latency and energy efficiency for preprocessing tasks in recommender systems, providing an alternative design point for systems that today are in very high demand.

Via

Access Paper or Ask Questions

UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

Sep 30, 2024

Cheng Zhang, Dong Gong, Jiumei He, Yu Zhu, Jinqiu Sun, Yanning Zhang

Figure 1 for UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

Figure 2 for UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

Figure 3 for UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

Figure 4 for UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

Abstract:Existing unified methods typically treat multi-degradation image restoration as a multi-task learning problem. Despite performing effectively compared to single degradation restoration methods, they overlook the utilization of commonalities and specificities within multi-task restoration, thereby impeding the model's performance. Inspired by the success of deep generative models and fine-tuning techniques, we proposed a universal image restoration framework based on multiple low-rank adapters (LoRA) from multi-domain transfer learning. Our framework leverages the pre-trained generative model as the shared component for multi-degradation restoration and transfers it to specific degradation image restoration tasks using low-rank adaptation. Additionally, we introduce a LoRA composing strategy based on the degradation similarity, which adaptively combines trained LoRAs and enables our model to be applicable for mixed degradation restoration. Extensive experiments on multiple and mixed degradations demonstrate that the proposed universal image restoration method not only achieves higher fidelity and perceptual image quality but also has better generalization ability than other unified image restoration models. Our code is available at https://github.com/Justones/UIR-LoRA.

Via

Access Paper or Ask Questions

Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Aug 03, 2024

Qiucen Wu, Tian Lin, Xianghao Yu, Yu Zhu, Robert Schober

Figure 1 for Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Figure 2 for Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Figure 3 for Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Figure 4 for Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Abstract:Intelligent reflecting surfaces (IRSs) have been regarded as a promising enabler for future wireless communication systems. In the literature, IRSs have been considered power-free or assumed to have constant power consumption. However, recent experimental results have shown that for positive-intrinsic-negative (PIN) diode-based IRSs, the power consumption dynamically changes with the phase shift configuration. This phase shift-dependent power consumption (PS-DPC) introduces a challenging power allocation problem between base station (BS) and IRS. To tackle this issue, in this paper, we investigate a rate maximization problem for IRS-assisted systems under a practical PS-DPC model. For the single-user case, we propose a generalized Benders decomposition-based beamforming method to maximize the achievable rate while satisfying a total system power consumption constraint. Moreover, we propose a low-complexity beamforming design, where the powers allocated to BS and IRS are optimized offline based on statistical channel state information. Furthermore, for the multi-user case, we solve an equivalent weighted mean square error minimization problem with two different joint power allocation and phase shift optimization methods. Simulation results indicate that compared to baseline schemes, our proposed methods can flexibly optimize the power allocation between BS and IRS, thus achieving better performance. The optimized power allocation strategy strongly depends on the system power budget. When the system power budget is high, the PS-DPC is not the dominant factor in the system power consumption, allowing the IRS to turn on as many PIN diodes as needed to achieve high beamforming quality. When the system power budget is limited, however, more power tends to be allocated to the BS to enhance the transmit power, resulting in a lower beamforming quality at the IRS due to the reduced PS-DPC budget.

Via

Access Paper or Ask Questions

Multi-Granularity Language-Guided Multi-Object Tracking

Jun 07, 2024

Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

Figure 1 for Multi-Granularity Language-Guided Multi-Object Tracking

Figure 2 for Multi-Granularity Language-Guided Multi-Object Tracking

Figure 3 for Multi-Granularity Language-Guided Multi-Object Tracking

Figure 4 for Multi-Granularity Language-Guided Multi-Object Tracking

Abstract:Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as occlusion, blur and domain variance. In this work, we argue that multi-modal language-driven features provide complementary information to classical visual features, thereby aiding in improving the robustness to such environmental interference. To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain discriminative representations. To develop LG-MOT, we annotate existing MOT datasets with scene-and instance-level language descriptions. We then encode both instance-and scene-level language information into high-dimensional embeddings, which are utilized to guide the visual features during training. At inference, our LG-MOT uses the standard visual features without relying on annotated language descriptions. Extensive experiments on three benchmarks, MOT17, DanceTrack and SportsMOT, reveal the merits of the proposed contributions leading to state-of-the-art performance. On the DanceTrack test set, our LG-MOT achieves an absolute gain of 2.2\% in terms of target object association (IDF1 score), compared to the baseline using only visual features. Further, our LG-MOT exhibits strong cross-domain generalizability. The dataset and code will be available at ~\url{https://github.com/WesLee88524/LG-MOT}.

Via

Access Paper or Ask Questions