Sherman
Abstract:The stacked intelligent metasurface (SIM), comprising multiple layers of reconfigurable transmissive metasurfaces, is becoming an increasingly viable solution for future wireless communication systems. In this paper, we explore the integration of SIM in a multi-antenna base station for application to downlink multi-user communications, and a realistic power consumption model for SIM-assisted systems is presented. Specifically, we focus on maximizing the energy efficiency (EE) for hybrid precoding design, i.e., the base station digital precoding and SIM wave-based beamforming. Due to the non-convexity and high complexity of the formulated problem, we employ the quadratic transformation method to reformulate the optimization problem and propose an alternating optimization (AO)-based joint precoding framework. Specifically, a successive convex approximation (SCA) algorithm is adopted for the base station precoding design. For the SIM wave-based beamforming, two algorithms are employed: the high-performance semidefinite programming (SDP) method and the low-complexity projected gradient ascent (PGA) algorithm. In particular, the results indicate that while the optimal number of SIM layers for maximizing the EE and spectral efficiency differs, a design of 2 to 5 layers can achieve satisfactory performance for both. Finally, numerical results are illustrated to evaluate the effectiveness of the proposed hybrid precoding framework and to showcase the performance enhancement achieved by the algorithm in comparison to benchmark schemes.
Abstract:In frequency division duplex (FDD) multiple-input multiple-output (MIMO) wireless communication systems, the acquisition of downlink channel state information (CSI) is essential for maximizing spatial resource utilization and improving system spectral efficiency. The separate design of modules in AI-based CSI feedback architectures under traditional modular communication frameworks, including channel estimation (CE), CSI compression and feedback, leads to sub-optimal performance. In this paper, we propose an uplink assisted joint CE and and CSI feedback approach via deep learning for downlink CSI acquisition, which mitigates performance degradation caused by distribution bias across separately trained modules in traditional modular communication frameworks. The proposed network adopts a deep joint source-channel coding (DJSCC) architecture to mitigate the cliff effect encountered in the conventional separate source-channel coding. Furthermore, we exploit the uplink CSI as auxiliary information to enhance CSI reconstruction accuracy by leveraging the partial reciprocity between the uplink and downlink channels in FDD systems, without introducing additional overhead. The effectiveness of uplink CSI as assisted information and the necessity of an end-toend multi-module joint training architecture is validated through comprehensive ablation and scalability experiments.
Abstract:How can robots learn dexterous grasping skills efficiently and apply them adaptively based on user instructions? This work tackles two key challenges: efficient skill acquisition from limited human demonstrations and context-driven skill selection. We introduce AdaDexGrasp, a framework that learns a library of grasping skills from a single human demonstration per skill and selects the most suitable one using a vision-language model (VLM). To improve sample efficiency, we propose a trajectory following reward that guides reinforcement learning (RL) toward states close to a human demonstration while allowing flexibility in exploration. To learn beyond the single demonstration, we employ curriculum learning, progressively increasing object pose variations to enhance robustness. At deployment, a VLM retrieves the appropriate skill based on user instructions, bridging low-level learned skills with high-level intent. We evaluate AdaDexGrasp in both simulation and real-world settings, showing that our approach significantly improves RL efficiency and enables learning human-like grasp strategies across varied object configurations. Finally, we demonstrate zero-shot transfer of our learned policies to a real-world PSYONIC Ability Hand, with a 90% success rate across objects, significantly outperforming the baseline.
Abstract:In recent years, high-speed trains (HSTs) communications have developed rapidly to enhance the stability of train operations and improve passenger connectivity experiences. However, as the train continues to accelerate, urgent technological innovations are needed to overcome challenges such as frequency handover and significant Doppler effects. In this paper, we present a novel architecture featuring movable antennas (MAs) to fully exploit macro spatial diversity, enabling a cell-free (CF) massive multiple-input multiple-output (MIMO) system that supports high-speed train communications. Considering the high likelihood of line-of-sight (LoS) transmission in HST scenario, we derive the uplink spectral efficiency (SE) expression for the movable CF massive MIMO system. Moreover, an optimization problem is formulated to maximize the sum SE of the considered system by optimizing the positions of the antennas. Since the formulated problem is non-convex and highly non-linear, we improve a deep reinforcement learning algorithm to address it by using proximal policy optimization (PPO). Different from traditional optimization approaches, which optimize variables separately and alternately, our improved PPO-based approach optimizes all the variables in unison. Simulation results demonstrate that movable CF massive MIMO effectively suppresses the negative impact of the Doppler effect in HST communications.
Abstract:Manipulating deformable objects like cloth is challenging due to their complex dynamics, near-infinite degrees of freedom, and frequent self-occlusions, which complicate state estimation and dynamics modeling. Prior work has struggled with robust cloth state estimation, while dynamics models, primarily based on Graph Neural Networks (GNNs), are limited by their locality. Inspired by recent advances in generative models, we hypothesize that these expressive models can effectively capture intricate cloth configurations and deformation patterns from data. Building on this insight, we propose a diffusion-based generative approach for both perception and dynamics modeling. Specifically, we formulate state estimation as reconstructing the full cloth state from sparse RGB-D observations conditioned on a canonical cloth mesh and dynamics modeling as predicting future states given the current state and robot actions. Leveraging a transformer-based diffusion model, our method achieves high-fidelity state reconstruction while reducing long-horizon dynamics prediction errors by an order of magnitude compared to GNN-based approaches. Integrated with model-predictive control (MPC), our framework successfully executes cloth folding on a real robotic system, demonstrating the potential of generative models for manipulation tasks with partial observability and complex dynamics.
Abstract:The rapid development of the quantum technology presents huge opportunities for 6G communications. Leveraging the quantum properties of highly excited Rydberg atoms, Rydberg atom-based antennas present distinct advantages, such as high sensitivity, broad frequency range, and compact size, over traditional antennas. To realize efficient precoding, accurate channel state information is essential. However, due to the distinct characteristics of atomic receivers, traditional channel estimation algorithms developed for conventional receivers are no longer applicable. To this end, we propose a novel channel estimation algorithm based on projection gradient descent (PGD), which is applicable to both one-dimensional (1D) and twodimensional (2D) arrays. Simulation results are provided to show the effectiveness of our proposed channel estimation method.
Abstract:Reconfigurable intelligent surface (RIS)-aided cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising architecture for further improving spectral efficiency (SE) with low cost and power consumption. However, conventional RIS has inevitable limitations due to its capability of only reflecting signals. In contrast, beyond-diagonal RIS (BD-RIS), with its ability to both reflect and transmit signals, has gained great attention. This correspondence focuses on using BD-RIS to improve the sum SE of CF mMIMO systems. This requires completing the beamforming design under the transmit power constraints and unitary constraints of the BD-RIS, by optimizing active and passive beamformer simultaneously. To tackle this issue, we introduce an alternating optimization algorithm that decomposes it using fractional programming and solves the subproblems alternatively. Moreover, to address the challenge introduced by the unitary constraint on the beamforming matrix of the BD-RIS, a manifold optimization algorithm is proposed to solve the problem optimally. Simulation results show that BD-RISs outperform RISs comprehensively, especially in the case of the full connected architecture which achieves the best performance, enhancing the sum SE by around 40% compared to ideal RISs.
Abstract:As the number of antennas in frequency-division duplex (FDD) multiple-input multiple-output (MIMO) systems increases, acquiring channel state information (CSI) becomes increasingly challenging due to limited spectral resources and feedback overhead. In this paper, we propose an end-to-end network that conducts joint design with pilot design, CSI estimation, CSI feedback, and precoding design in the multi-user MIMO orthogonal frequency-division multiplexing (OFDM) scenario. Multiple communication modules are jointly designed and trained with a common optimization objective to prevent mismatches between modules and discrepancies between individual module objectives and the final system goal. Experimental results demonstrate that, under the same feedback and CE overheads, the proposed joint multi-module end-to-end network achieves a higher multi-user downlink spectral efficiency than traditional algorithms based on separate architecture and partially separated artificial intelligence-based network architectures under comparable channel quality. Furthermore, compared to conventional separate architecture, the proposed network architecture with joint architecture reduces the computational burden and model storage overhead at the UE side, facilitating the deployment of low-overhead multi-module joint architectures in practice. While slightly increasing storage requirements at the base station, it reduces computational complexity and precoding design delay, effectively reducing the effects of channel aging challenges.
Abstract:With the advancement of sixth-generation (6G) wireless communication systems, integrated sensing and communication (ISAC) is crucial for perceiving and interacting with the environment via electromagnetic propagation, termed channel semantics, to support tasks like decision-making. However, channel models focusing on physical characteristics face challenges in representing semantics embedded in the channel, thereby limiting the evaluation of ISAC systems. To tackle this, we present a novel framework for channel modeling from the conceptual event perspective. By leveraging a multi-level semantic structure and characterized knowledge libraries, the framework decomposes complex channel characteristics into extensible semantic characterization, thereby better capturing the relationship between environment and channel, and enabling more flexible adjustments of channel models for different events without requiring a complete reset. Specifically, we define channel semantics on three levels: status semantics, behavior semantics, and event semantics, corresponding to channel multipaths, channel time-varying trajectories, and channel topology, respectively. Taking realistic vehicular ISAC scenarios as an example, we perform semantic clustering, characterizing status semantics via multipath statistical distributions, modeling behavior semantics using Markov chains for time variation, and representing event semantics through a co-occurrence matrix. Results show the model accurately generates channels while capturing rich semantic information. Moreover, its generalization supports customized semantics.
Abstract:Cell-free (CF) massive multiple-input multiple-output (mMIMO) systems offer high spectral efficiency (SE) through multiple distributed access points (APs). However, the large number of antennas increases power consumption. We propose incorporating stacked intelligent metasurfaces (SIM) into CF mMIMO systems as a cost-effective, energy-efficient solution. This paper focuses on optimizing the joint power allocation of APs and the phase shift of SIMs to maximize the sum SE. To address this complex problem, we introduce a fully distributed multi-agent reinforcement learning (MARL) algorithm. Our novel algorithm, the noisy value method with a recurrent policy in multi-agent policy optimization (NVR-MAPPO), enhances performance by encouraging diverse exploration under centralized training and decentralized execution. Simulations demonstrate that NVR-MAPPO significantly improves sum SE and robustness across various scenarios.