Abstract:Reconfigurable Intelligent Surface (RIS) is a breakthrough technology enabling the dynamic control of the propagation environment in wireless communications through programmable surfaces. To improve the flexibility of conventional diagonal RIS (D-RIS), beyond diagonal RIS (BD-RIS) has emerged as a family of more general RIS architectures. However, D-RIS and BD-RIS have been commonly explored neglecting mutual coupling effects, while the global optimization of RIS with mutual coupling, its performance limits, and scaling laws remain unexplored. This study addresses these gaps by deriving global optimal closed-form solutions for BD-RIS with mutual coupling to maximize the channel gain, specifically fully- and tree-connected RISs. Besides, we provide the expression of the maximum channel gain achievable in the presence of mutual coupling and its scaling law in closed form. By using the derived scaling laws, we analytically prove that mutual coupling increases the channel gain on average under Rayleigh fading channels. Our theoretical analysis, confirmed by numerical simulations, shows that both fully- and tree-connected RISs with mutual coupling achieve the same channel gain upper bound when optimized with the proposed global optimal solutions. Furthermore, we observe that a mutual coupling-unaware optimization of RIS can cause a channel gain degradation of up to 5 dB.
Abstract:Beyond diagonal reconfigurable intelligent surfaces (BD-RIS) is a new advance in RIS techniques that introduces reconfigurable inter-element connections to generate scattering matrices not limited to being diagonal. BD-RIS has been recently proposed and proven to have benefits in enhancing channel gain and enlarging coverage in wireless communications. Uniquely, BD-RIS enables reciprocal and non-reciprocal architectures characterized by symmetric and non-symmetric scattering matrices. However, the performance benefits and new use cases enabled by non-reciprocal BD-RIS for wireless systems remain unexplored. This work takes a first step toward closing this knowledge gap and studies the non-reciprocal BD-RIS in full-duplex systems and its performance benefits over reciprocal counterparts. We start by deriving a general RIS aided full-duplex system model using a multiport circuit theory, followed by a simplified channel model based on physically consistent assumptions. With the considered channel model, we investigate the effect of BD-RIS non-reciprocity and identify the theoretical conditions for reciprocal and non-reciprocal BD-RISs to simultaneously achieve the maximum received power of the signal of interest in the uplink and the downlink. Simulation results validate the theories and highlight the significant benefits offered by non-reciprocal BD-RIS in full-duplex systems. The significant gains are achieved because of the non-reciprocity principle which implies that if a wave hits the non-reciprocal BD-RIS from one direction, the surface behaves differently than if it hits from the opposite direction. This enables an uplink user and a downlink user at different locations to optimally communicate with the same full-duplex base station via a non-reciprocal BD-RIS, which would not be possible with reciprocal surfaces.
Abstract:Panoptic narrative grounding (PNG), whose core target is fine-grained image-text alignment, requires a panoptic segmentation of referred objects given a narrative caption. Previous discriminative methods achieve only weak or coarse-grained alignment by panoptic segmentation pretraining or CLIP model adaptation. Given the recent progress of text-to-image Diffusion models, several works have shown their capability to achieve fine-grained image-text alignment through cross-attention maps and improved general segmentation performance. However, the direct use of phrase features as static prompts to apply frozen Diffusion models to the PNG task still suffers from a large task gap and insufficient vision-language interaction, yielding inferior performance. Therefore, we propose an Extractive-Injective Phrase Adapter (EIPA) bypass within the Diffusion UNet to dynamically update phrase prompts with image features and inject the multimodal cues back, which leverages the fine-grained image-text alignment capability of Diffusion models more sufficiently. In addition, we also design a Multi-Level Mutual Aggregation (MLMA) module to reciprocally fuse multi-level image and phrase features for segmentation refinement. Extensive experiments on the PNG benchmark show that our method achieves new state-of-the-art performance.
Abstract:In this paper, we propose an Audio-Language-Referenced SAM 2 (AL-Ref-SAM 2) pipeline to explore the training-free paradigm for audio and language-referenced video object segmentation, namely AVS and RVOS tasks. The intuitive solution leverages GroundingDINO to identify the target object from a single frame and SAM 2 to segment the identified object throughout the video, which is less robust to spatiotemporal variations due to a lack of video context exploration. Thus, in our AL-Ref-SAM 2 pipeline, we propose a novel GPT-assisted Pivot Selection (GPT-PS) module to instruct GPT-4 to perform two-step temporal-spatial reasoning for sequentially selecting pivot frames and pivot boxes, thereby providing SAM 2 with a high-quality initial object prompt. Within GPT-PS, two task-specific Chain-of-Thought prompts are designed to unleash GPT's temporal-spatial reasoning capacity by guiding GPT to make selections based on a comprehensive understanding of video and reference information. Furthermore, we propose a Language-Binded Reference Unification (LBRU) module to convert audio signals into language-formatted references, thereby unifying the formats of AVS and RVOS tasks in the same pipeline. Extensive experiments on both tasks show that our training-free AL-Ref-SAM 2 pipeline achieves performances comparable to or even better than fully-supervised fine-tuning methods. The code is available at: https://github.com/appletea233/AL-Ref-SAM2.
Abstract:Beyond diagonal reconfigurable intelligent surfaces (BD-RIS) generalizes and goes beyond conventional diagonal reconfigurable intelligent surfaces (D-RIS) by interconnecting elements to generate beyond diagonal scattering matrices, which significantly strengthen the wireless channels. In this work, we use BD-RIS for passive multiuser beamforming in multiuser multiple-input-single-output (MU-MISO) systems. Specifically, we design the scattering matrix of BD-RIS to either maximize the sum received signal power at the users following maximum ratio transmission (MRT), or to nullify the interference at the users following zero forcing (ZF). Furthermore, we investigate uniform/optimized power allocation and ZF precoding at the base station (BS). Numerical results show that BD-RIS improves the interference nulling capability and sum rate with fewer reflecting elements (REs) compared to D-RIS. In addition, at moderate to high signal to noise ratios (SNRs), passive interference nulling reduces the complexity at the BS by relaxing the need for precoding or water-filling power allocation design. Furthermore, the passive MRT with ZF precoding achieves a tight sum rate performance to the joint design considering MU-MISO scenarios with many REs while maintaining low computational complexity and simplifying the channel estimation.
Abstract:To achieve dexterity comparable to that of humans, robots must intelligently process tactile sensor data. Taxel-based tactile signals often have low spatial-resolution, with non-standardized representations. In this paper, we propose a novel framework, HyperTaxel, for learning a geometrically-informed representation of taxel-based tactile signals to address challenges associated with their spatial resolution. We use this representation and a contrastive learning objective to encode and map sparse low-resolution taxel signals to high-resolution contact surfaces. To address the uncertainty inherent in these signals, we leverage joint probability distributions across multiple simultaneous contacts to improve taxel hyper-resolution. We evaluate our representation by comparing it with two baselines and present results that suggest our representation outperforms the baselines. Furthermore, we present qualitative results that demonstrate the learned representation captures the geometric features of the contact surface, such as flatness, curvature, and edges, and generalizes across different objects and sensor configurations. Moreover, we present results that suggest our representation improves the performance of various downstream tasks, such as surface classification, 6D in-hand pose estimation, and sim-to-real transfer.
Abstract:This paper addresses the channel estimation problem for beyond diagonal reconfigurable intelligent surface (BD-RIS) from a tensor decomposition perspective. We first show that the received pilot signals can be arranged as a three-way tensor, allowing us to recast the cascaded channel estimation problem as a block Tucker decomposition problem that yields decoupled estimates for the involved channel matrices while offering a substantial performance gain over the conventional (matrix-based) least squares (LS) estimation method. More specifically, we develop two solutions to solve the problem. The first one is a closed-form solution that extracts the channel estimates via a block Tucker Kronecker factorization (BTKF), which boils down to solving a set of parallel rank-one matrix approximation problems. Exploiting such a low-rank property yields a noise rejection gain compared to the standard LS estimation scheme while allowing the two involved channels to be estimated separately. The second solution is based on a block Tucker alternating least squares (BTALS) algorithm that directly estimates the involved channel matrices using an iterative estimation procedure. We discuss the uniqueness and identifiability issues and their implications for training design. We also propose a tensor-based design of the BD-RIS training tensor for each algorithm that ensures unique decoupled channel estimates under trivial scaling ambiguities. Our numerical results shed light on the tradeoffs offered by BTKF and BTALS methods. Specifically, while the first enjoys fast and parallel extraction of the channel estimates in closed form, the second has a more flexible training design, allowing for a significantly reduced training overhead compared to the state-of-the-art LS method.
Abstract:This paper investigates the capability of a passive Reconfigurable Intelligent Surface (RIS) to redistribute the singular values of point-to-point Multiple-Input Multiple-Output (MIMO) channels for achieving power and rate gains. We depart from the conventional Diagonal (D)-RIS with diagonal phase shift matrix and adopt a Beyond Diagonal (BD) architecture that offers greater wave manipulation flexibility through element-wise connections. Specifically, we first provide shaping insights by characterizing the channel singular value regions attainable by D-RIS and BD-RIS via a novel geodesic optimization. Analytical singular value bounds are then derived to explore their shaping limits in typical deployment scenarios. As a side product, we tackle BD-RIS-aided MIMO rate maximization problem by a local-optimal Alternating Optimization (AO) and a shaping-inspired low-complexity approach. Results show that compared to D-RIS, BD-RIS significantly improves the dynamic range of all channel singular values, the trade-off in manipulating them, and thus the channel power and achievable rate. Those observations become more pronounced when the number of RIS elements and MIMO dimensions increase. Of particular interest, BD-RIS is shown to activate multi-stream transmission at lower transmit power than D-RIS, hence achieving the asymptotic Degrees of Freedom (DoF) at low Signal-to-Noise Ratio (SNR) thanks to its higher flexibility of shaping the distribution of channel singular values.
Abstract:This paper proposes a cooperative integrated sensing and communication network (Co-ISACNet) adopting hybrid beamforming (HBF) architecture, which improves both radar sensing and communication performance. The main contributions of this work are four-fold. First, we introduce a novel cooperative sensing method for the considered Co-ISACNet, followed by a comprehensive analysis of this method. This analysis mathematically verifies the benefits of Co-ISACNet and provides insightful design guidelines. Second, to show the benefits of Co-ISACNet, we propose to jointly design the HBF to maximize the network communication capacity while satisfying the constraint of beampattern similarity for radar sensing, which results in a highly dimensional and non-convex problem. Third, to facilitate the joint design, we propose a novel distributed optimization framework based on proximal gradient and alternating direction method of multipliers, namely PANDA. Fourth, we further adopt the proposed PANDA framework to solve the joint HBF design problem for the Co-ISACNet. By using the proposed PANDA framework, all access points (APs) optimize the HBF in parallel, where each AP only requires local channel state information and limited message exchange among the APs. Such framework reduces significantly the computational complexity and thus has pronounced benefits in practical scenarios. Simulation results verify the effectiveness of the proposed algorithm compared with the conventional centralized algorithm and show the remarkable performance improvement of radar sensing and communication by deploying Co-ISACNet.
Abstract:The multi-sector intelligent surface (IS), benefiting from a smarter wave manipulation capability, has been shown to enhance channel gain and offer full-space coverage in communications. However, the benefits of multi-sector IS in wireless sensing remain unexplored. This paper introduces the application of multi-sector IS for wireless sensing/localization. Specifically, we propose a new self-sensing system, where an active source controller uses the multi-sector IS geometry to reflect/scatter the emitted signals towards the entire space, thereby achieving full-space coverage for wireless sensing. Additionally, dedicated sensors are installed aligned with the IS elements at each sector, which collect echo signals from the target and cooperate to sense the target angle. In this context, we develop a maximum likelihood estimator of the target angle for the proposed multi-sector IS self-sensing system, along with the corresponding theoretical limits defined by the Cram\'er-Rao Bound. The analysis reveals that the advantages of the multi-sector IS self-sensing system stem from two aspects: enhancing the probing power on targets (thereby improving power efficiency) and increasing the rate of target angle (thereby enhancing the transceiver's sensitivity to target angles). Finally, our analysis and simulations confirm that the multi-sector IS self-sensing system, particularly the 4-sector architecture, achieves full-space sensing capability beyond the single-sector IS configuration. Furthermore, similarly to communications, employing directive antenna patterns on each sector's IS elements and sensors significantly enhances sensing capabilities. This enhancement originates from both aspects of improved power efficiency and target angle sensitivity, with the former also being observed in communications while the latter being unique in sensing.