Abstract:Recent advancements in vision-language models (VLMs) for common-sense reasoning have led to the development of vision-language-action (VLA) models, enabling robots to perform generalized manipulation. Although existing autoregressive VLA methods leverage large-scale pretrained knowledge, they disrupt the continuity of actions. Meanwhile, some VLA methods incorporate an additional diffusion head to predict continuous actions, relying solely on VLM-extracted features, which limits their reasoning capabilities. In this paper, we introduce HybridVLA, a unified framework that seamlessly integrates the strengths of both autoregressive and diffusion policies within a single large language model, rather than simply connecting them. To bridge the generation gap, a collaborative training recipe is proposed that injects the diffusion modeling directly into the next-token prediction. With this recipe, we find that these two forms of action prediction not only reinforce each other but also exhibit varying performance across different tasks. Therefore, we design a collaborative action ensemble mechanism that adaptively fuses these two predictions, leading to more robust control. In experiments, HybridVLA outperforms previous state-of-the-art VLA methods across various simulation and real-world tasks, including both single-arm and dual-arm robots, while demonstrating stable manipulation in previously unseen configurations.
Abstract:Traditional discrete-array-based systems fail to exploit interactions between closely spaced antennas, resulting in inadequate utilization of the aperture resource. In this paper, we propose a holographic intelligence surface (HIS) assisted integrated sensing and communication (HISAC) system, wherein both the transmitter and receiver are fabricated using a continuous-aperture array. A continuous-discrete transformation of the HIS pattern based on the Fourier transform is proposed, converting the continuous pattern design into a discrete beamforming design. We formulate a joint transmit-receive beamforming optimization problem for the HISAC system, aiming to balance the performance of multi-target sensing while fulfilling the performance requirement of multi-user communication. To solve the non-convex problem with coupled variables, an alternating optimization-based algorithm is proposed to optimize the HISAC transmit-receive beamforming in an alternate manner. Specifically, the transmit beamforming design is solved by decoupling into a series of feasibility-checking sub-problems while the receive beamforming is determined by the Rayleigh quotient-based method. Simulation results demonstrate the superiority of the proposed HISAC system over traditional discrete-array-based ISAC systems, achieving significantly higher sensing performance while guaranteeing predetermined communication performance.
Abstract:Radar imaging is crucial in remote sensing and has many applications in detection and autonomous driving. However, the received radar signal for imaging is enormous and redundant, which degrades the speed of real-time radar quantitative imaging and leads to obstacles in the downlink applications. In this paper, we propose a physics-assisted deep learning method for radar quantitative imaging with the advantage of compressed sensing (CS). Specifically, the signal model for frequency-modulated continuous-wave (FMCW) radar imaging which only uses four antennas and parts of frequency components is formulated in terms of matrices multiplication. The learned fast iterative shrinkage-thresholding algorithm with residual neural network (L-FISTA-ResNet) is proposed for solving the quantitative imaging problem. The L-FISTA is developed to ensure the basic solution and ResNet is attached to enhance the image quality. Simulation results show that our proposed method has higher reconstruction accuracy than the traditional optimization method and pure neural networks. The effectiveness and generalization performance of the proposed strategy is verified in unseen target imaging, denoising, and frequency migration tasks.
Abstract:Dual-function radar-communication (DFRC) technology is emerging in next-generation wireless systems. Reconfigurable intelligent surface (RIS) arrays have been suggested as a crucial sensor component of the DFRC. In this paper, we propose a hybrid RIS (HRIS)-assisted multiple-input multiple-output (MIMO) DFRC system, where the HRIS is capable of reflecting communication signals to mobile users and receiving the scattering signal reflected from the radar target simultaneously. Under such a scenario, we are interested in characterizing the fundamental trade-off between radar sensing and communication. Specifically, we study the joint design of the beamforming vectors at the base station (BS) and the parameter configuration of the HRIS so as to maximize the signal-to-interference-and-noise ratio (SINR) of the radar while guaranteeing a communication SINR requirement. To solve the formulated non-convex beamforming design problem, we propose an efficient alternating optimization approach. In particular, for fixed beams at the BS, we use a fast grid search-assisted auto gradient descent (FGS-AGD) algorithm to seek the best HRIS configuration; Then, a closed-form BS beamforming solution is obtained using semidefinite relaxation. Numerical results indicate that compared with benchmark schemes, the proposed approach is capable of improving the radar performance and communication quality significantly and simultaneously.