Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Luyuan Zhang

Two-Timescale Joint Transmit and Pinching Beamforming for Pinching-Antenna Systems

Apr 13, 2025

Luyuan Zhang, Xidong Mu, An Liu, Yuanwei Liu

Figure 1 for Two-Timescale Joint Transmit and Pinching Beamforming for Pinching-Antenna Systems

Figure 2 for Two-Timescale Joint Transmit and Pinching Beamforming for Pinching-Antenna Systems

Figure 3 for Two-Timescale Joint Transmit and Pinching Beamforming for Pinching-Antenna Systems

Figure 4 for Two-Timescale Joint Transmit and Pinching Beamforming for Pinching-Antenna Systems

Abstract:Pinching antenna systems (PASS) have been proposed as a revolutionary flexible antenna technology which facilitates line-of-sight links via numerous low-cost pinching antennas with adjustable activation positions over waveguides. This letter proposes a two-timescale joint transmit and pinching beamforming design for the maximization of sum rate of a PASS-based downlink multi-user multiple input single output system. A primal dual decomposition method is developed to decouple the two-timescale problem into two sub-problems: 1) A Karush-Kuhn-Tucker-guided dual learning-based approach is proposed to solve the short-term transmit beamforming design sub-problem; 2) The long-term pinching beamforming design sub-problem is tackled by adopting a stochastic successive convex approximation method. Simulation results demonstrate that the proposed two-timescale algorithm achieves a significant performance gain compared to other baselines.

* 5 pages, 4 figures, letter

Via

Access Paper or Ask Questions

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

Apr 01, 2025

Siyuan Li, Luyuan Zhang, Zedong Wang, Juanxi Tian, Cheng Tan, Zicheng Liu, Chang Yu, Qingsong Xie, Haonan Lu, Haoqian Wang(+1 more)

Abstract:Masked Image Modeling (MIM) with Vector Quantization (VQ) has achieved great success in both self-supervised pre-training and image generation. However, most existing methods struggle to address the trade-off in shared latent space for generation quality vs. representation learning and efficiency. To push the limits of this paradigm, we propose MergeVQ, which incorporates token merging techniques into VQ-based generative models to bridge the gap between image generation and visual representation learning in a unified architecture. During pre-training, MergeVQ decouples top-k semantics from latent space with the token merge module after self-attention blocks in the encoder for subsequent Look-up Free Quantization (LFQ) and global alignment and recovers their fine-grained details through cross-attention in the decoder for reconstruction. As for the second-stage generation, we introduce MergeAR, which performs KV Cache compression for efficient raster-order prediction. Extensive experiments on ImageNet verify that MergeVQ as an AR generative model achieves competitive performance in both visual representation learning and image generation tasks while maintaining favorable token efficiency and inference speed. The code and model will be available at https://apexgen-x.github.io/MergeVQ.

* CVPR2025 (in process for more analysis and extension)

Via

Access Paper or Ask Questions

Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

Oct 08, 2024

Siyuan Li, Juanxi Tian, Zedong Wang, Luyuan Zhang, Zicheng Liu, Weiyang Jin, Yang Liu, Baigui Sun, Stan Z. Li

Figure 1 for Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

Figure 2 for Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

Figure 3 for Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

Figure 4 for Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

Abstract:This paper delves into the interplay between vision backbones and optimizers, unvealing an inter-dependent phenomenon termed \textit{\textbf{b}ackbone-\textbf{o}ptimizer \textbf{c}oupling \textbf{b}ias} (BOCB). We observe that canonical CNNs, such as VGG and ResNet, exhibit a marked co-dependency with SGD families, while recent architectures like ViTs and ConvNeXt share a tight coupling with the adaptive learning rate ones. We further show that BOCB can be introduced by both optimizers and certain backbone designs and may significantly impact the pre-training and downstream fine-tuning of vision models. Through in-depth empirical analysis, we summarize takeaways on recommended optimizers and insights into robust vision backbone architectures. We hope this work can inspire the community to question long-held assumptions on backbones and optimizers, stimulate further explorations, and thereby contribute to more robust vision systems. The source code and models are publicly available at https://bocb-ai.github.io/.

* Preprint V1. Online project at https://bocb-ai.github.io/

Via

Access Paper or Ask Questions

Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Jan 09, 2024

Siyuan Li, Luyuan Zhang, Zedong Wang, Di Wu, Lirong Wu, Zicheng Liu, Jun Xia, Cheng Tan, Yang Liu, Baigui Sun(+1 more)

Figure 1 for Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Figure 2 for Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Figure 3 for Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Figure 4 for Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Abstract:As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised techniques, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training. This paradigm enables deep models to learn robust representations and has demonstrated exceptional performance in the context of computer vision, natural language processing, and other modalities. In this survey, we present a comprehensive review of the masked modeling framework and its methodology. We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more. Then, we systematically investigate its wide-ranging applications across domains. Furthermore, we also explore the commonalities and differences between masked modeling methods in different fields. Toward the end of this paper, we conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research. A paper list project with this survey is available at \url{https://github.com/Lupin1998/Awesome-MIM}.

* Preprint v2 (fix typos and citations). GitHub project at https://github.com/Lupin1998/Awesome-MIM

Via

Access Paper or Ask Questions