Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hongyu Li

NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos

Oct 09, 2025

Hongyu Li, Lingfeng Sun, Yafei Hu, Duy Ta, Jennifer Barry, George Konidaris, Jiahui Fu

Abstract:Enabling robots to execute novel manipulation tasks zero-shot is a central goal in robotics. Most existing methods assume in-distribution tasks or rely on fine-tuning with embodiment-matched data, limiting transfer across platforms. We present NovaFlow, an autonomous manipulation framework that converts a task description into an actionable plan for a target robot without any demonstrations. Given a task description, NovaFlow synthesizes a video using a video generation model and distills it into 3D actionable object flow using off-the-shelf perception modules. From the object flow, it computes relative poses for rigid objects and realizes them as robot actions via grasp proposals and trajectory optimization. For deformable objects, this flow serves as a tracking objective for model-based planning with a particle-based dynamics model. By decoupling task understanding from low-level control, NovaFlow naturally transfers across embodiments. We validate on rigid, articulated, and deformable object manipulation tasks using a table-top Franka arm and a Spot quadrupedal mobile robot, and achieve effective zero-shot execution without demonstrations or embodiment-specific training. Project website: https://novaflow.lhy.xyz/.

Via

Access Paper or Ask Questions

UniTac: Whole-Robot Touch Sensing Without Tactile Sensors

Jul 10, 2025

Wanjia Fu, Hongyu Li, Ivy X. He, Stefanie Tellex, Srinath Sridhar

Abstract:Robots can better interact with humans and unstructured environments through touch sensing. However, most commercial robots are not equipped with tactile skins, making it challenging to achieve even basic touch-sensing functions, such as contact localization. We present UniTac, a data-driven whole-body touch-sensing approach that uses only proprioceptive joint sensors and does not require the installation of additional sensors. Our approach enables a robot equipped solely with joint sensors to localize contacts. Our goal is to democratize touch sensing and provide an off-the-shelf tool for HRI researchers to provide their robots with touch-sensing capabilities. We validate our approach on two platforms: the Franka robot arm and the Spot quadruped. On Franka, we can localize contact to within 8.0 centimeters, and on Spot, we can localize to within 7.2 centimeters at around 2,000 Hz on an RTX 3090 GPU without adding any additional sensors to the robot. Project website: https://ivl.cs.brown.edu/research/unitac.

Via

Access Paper or Ask Questions

A Tutorial on Beyond-Diagonal Reconfigurable Intelligent Surfaces: Modeling, Architectures, System Design and Optimization, and Applications

May 22, 2025

Hongyu Li, Matteo Nerini, Shanpu Shen, Bruno Clerckx

Abstract:Written by its inventors, this first tutorial on Beyond-Diagonal Reconfigurable Intelligent Surfaces (BD-RISs) provides the readers with the basics and fundamental tools necessary to appreciate, understand, and contribute to this emerging and disruptive technology. Conventional (Diagonal) RISs (D-RISs) are characterized by a diagonal scattering matrix $\mathbf{\Theta}$ such that the wave manipulation flexibility of D-RIS is extremely limited. In contrast, BD-RIS refers to a novel and general framework for RIS where its scattering matrix is not limited to be diagonal (hence, the ``beyond-diagonal'' terminology) and consequently, all entries of $\mathbf{\Theta}$ can potentially help shaping waves for much higher manipulation flexibility. This physically means that BD-RIS can artificially engineer and reconfigure coupling across elements of the surface thanks to inter-element reconfigurable components which allow waves absorbed by one element to flow through other elements. Consequently, BD-RIS opens the door to more general and versatile intelligent surfaces that subsumes existing RIS architectures as special cases. In this tutorial, we share all the secret sauce to model, design, and optimize BD-RIS and make BD-RIS transformative in many different applications. Topics discussed include physics-consistent and multi-port network-aided modeling; transmitting, reflecting, hybrid, and multi-sector mode analysis; reciprocal and non-reciprocal architecture designs and optimal performance-complexity Pareto frontier of BD-RIS; signal processing, optimization, and channel estimation for BD-RIS; hardware impairments (discrete-value impedance and admittance, lossy interconnections and components, wideband effects, mutual coupling) of BD-RIS; benefits and applications of BD-RIS in communications, sensing, power transfer.

* 42 pages, 37 figures, submitted to IEEE journal for future publication

Via

Access Paper or Ask Questions

Lossy Beyond Diagonal Reconfigurable Intelligent Surfaces: Modeling and Optimization

Apr 28, 2025

Yiyang Peng, Hongyu Li, Zheyu Wu, Bruno Clerckx

Figure 1 for Lossy Beyond Diagonal Reconfigurable Intelligent Surfaces: Modeling and Optimization

Figure 2 for Lossy Beyond Diagonal Reconfigurable Intelligent Surfaces: Modeling and Optimization

Figure 3 for Lossy Beyond Diagonal Reconfigurable Intelligent Surfaces: Modeling and Optimization

Figure 4 for Lossy Beyond Diagonal Reconfigurable Intelligent Surfaces: Modeling and Optimization

Abstract:Beyond diagonal reconfigurable intelligent surface (BD-RIS) has emerged as an advancement and generalization of the conventional diagonal RIS (D-RIS) by introducing tunable interconnections between RIS elements, enabling smarter wave manipulation and enlarged coverage. While BD-RIS has demonstrated advantages over D-RIS in various aspects, most existing works rely on the assumption of a lossless model, leaving practical considerations unaddressed. This paper thus proposes a lossy BD-RIS model and develops corresponding optimization algorithms for various BD-RIS-aided communication systems. First, by leveraging admittance parameter analysis, we model each tunable admittance based on a lumped circuit with losses and derive an expression of a circle characterizing the real and imaginary parts of each tunable admittance. We then consider the received signal power maximization in single-user single-input single-output (SISO) systems with the proposed lossy BD-RIS model. To solve the optimization problem, we design an effective algorithm by carefully exploiting the problem structure. Specifically, an alternating direction method of multipliers (ADMM) framework is custom-designed to deal with the complicated constraints associated with lossy BD-RIS. Furthermore, we extend the proposed algorithmic framework to more general multiuser multiple-input single-output (MU-MISO) systems, where the transmit precoder and BD-RIS scattering matrix are jointly designed to maximize the sum-rate of the system. Finally, simulation results demonstrate that all BD-RIS architectures still outperform D-RIS in the presence of losses, but the optimal BD-RIS architectures in the lossless case are not necessarily optimal in the lossy case, e.g. group-connected BD-RIS can outperform fully- and tree-connected BD-RISs in SISO systems with relatively high losses, whereas the opposite always holds true in the lossless case.

Via

Access Paper or Ask Questions

ViTa-Zero: Zero-shot Visuotactile Object 6D Pose Estimation

Apr 17, 2025

Hongyu Li, James Akl, Srinath Sridhar, Tye Brady, Taskin Padir

Abstract:Object 6D pose estimation is a critical challenge in robotics, particularly for manipulation tasks. While prior research combining visual and tactile (visuotactile) information has shown promise, these approaches often struggle with generalization due to the limited availability of visuotactile data. In this paper, we introduce ViTa-Zero, a zero-shot visuotactile pose estimation framework. Our key innovation lies in leveraging a visual model as its backbone and performing feasibility checking and test-time optimization based on physical constraints derived from tactile and proprioceptive observations. Specifically, we model the gripper-object interaction as a spring-mass system, where tactile sensors induce attractive forces, and proprioception generates repulsive forces. We validate our framework through experiments on a real-world robot setup, demonstrating its effectiveness across representative visual backbones and manipulation scenarios, including grasping, object picking, and bimanual handover. Compared to the visual models, our approach overcomes some drastic failure modes while tracking the in-hand object pose. In our experiments, our approach shows an average increase of 55% in AUC of ADD-S and 60% in ADD, along with an 80% lower position error compared to FoundationPose.

* Accepted by ICRA 2025

Via

Access Paper or Ask Questions

Virtual domain extension for imposing boundary conditions in flow simulation using pre-trained local neural operator

Apr 14, 2025

Ximeng Ye, Hongyu Li, Zhen-Guo Yan

Abstract:This paper builds up a virtual domain extension (VDE) framework for imposing boundary conditions (BCs) in flow simulation using pre-trained local neural operator (LNO). It creates extended virtual domains to the input function to compensate for the corrosion nature of computational domains during LNO inference, thus turns the implementation of BC into the determination of field values on the extended domain. Several strategies to calculate the field values are proposed and validated in solving numerical examples, including padding operation, direct imposition, pressure symmetry, and optimization by backpropagation, and compared with boundary imposition in traditional solvers. It is found that the large time interval of LNO induces a relatively wide near-boundary domain to be processed, thus imposing BC on only a few nodes near the boundary following the immersed boundary conception in traditional solvers can hardly achieve high accuracy. With appropriate values assigned on the extended virtual domains, VDE can accurately impose BCs and lead to reasonable flow field predictions. This work provides a guidance for imposing BCs reliably in LNO prediction, which could facilitate the reuse of pre-trained LNO in more applications.

Via

Access Paper or Ask Questions

V-HOP: Visuo-Haptic 6D Object Pose Tracking

Feb 24, 2025

Hongyu Li, Mingxi Jia, Tuluhan Akbulut, Yu Xiang, George Konidaris, Srinath Sridhar

Abstract:Humans naturally integrate vision and haptics for robust object perception during manipulation. The loss of either modality significantly degrades performance. Inspired by this multisensory integration, prior object pose estimation research has attempted to combine visual and haptic/tactile feedback. Although these works demonstrate improvements in controlled environments or synthetic datasets, they often underperform vision-only approaches in real-world settings due to poor generalization across diverse grippers, sensor layouts, or sim-to-real environments. Furthermore, they typically estimate the object pose for each frame independently, resulting in less coherent tracking over sequences in real-world deployments. To address these limitations, we introduce a novel unified haptic representation that effectively handles multiple gripper embodiments. Building on this representation, we introduce a new visuo-haptic transformer-based object pose tracker that seamlessly integrates visual and haptic input. We validate our framework in our dataset and the Feelsight dataset, demonstrating significant performance improvement on challenging sequences. Notably, our method achieves superior generalization and robustness across novel embodiments, objects, and sensor types (both taxel-based and vision-based tactile sensors). In real-world experiments, we demonstrate that our approach outperforms state-of-the-art visual trackers by a large margin. We further show that we can achieve precise manipulation tasks by incorporating our real-time object tracking result into motion plans, underscoring the advantages of visuo-haptic perception. Our model and dataset will be made open source upon acceptance of the paper. Project website: https://lhy.xyz/projects/v-hop/

Via

Access Paper or Ask Questions

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Jan 14, 2025

Hongyu Li, Jinyu Chen, Ziyu Wei, Shaofei Huang, Tianrui Hui, Jialin Gao, Xiaoming Wei, Si Liu

Abstract:Recent advancements in multimodal large language models (MLLMs) have shown promising results, yet existing approaches struggle to effectively handle both temporal and spatial localization simultaneously. This challenge stems from two key issues: first, incorporating spatial-temporal localization introduces a vast number of coordinate combinations, complicating the alignment of linguistic and visual coordinate representations; second, encoding fine-grained temporal and spatial information during video feature compression is inherently difficult. To address these issues, we propose LLaVA-ST, a MLLM for fine-grained spatial-temporal multimodal understanding. In LLaVA-ST, we propose Language-Aligned Positional Embedding, which embeds the textual coordinate special token into the visual space, simplifying the alignment of fine-grained spatial-temporal correspondences. Additionally, we design the Spatial-Temporal Packer, which decouples the feature compression of temporal and spatial resolutions into two distinct point-to-region attention processing streams. Furthermore, we propose ST-Align dataset with 4.3M training samples for fine-grained spatial-temporal multimodal understanding. With ST-align, we present a progressive training pipeline that aligns the visual and textual feature through sequential coarse-to-fine stages.Additionally, we introduce an ST-Align benchmark to evaluate spatial-temporal interleaved fine-grained understanding tasks, which include Spatial-Temporal Video Grounding (STVG) , Event Localization and Captioning (ELC) and Spatial Video Grounding (SVG). LLaVA-ST achieves outstanding performance on 11 benchmarks requiring fine-grained temporal, spatial, or spatial-temporal interleaving multimodal understanding. Our code, data and benchmark will be released at Our code, data and benchmark will be released at https://github.com/appletea233/LLaVA-ST .

Via

Access Paper or Ask Questions

A Decoupled Channel Estimation Method for Beyond Diagonal RIS

Dec 09, 2024

Bruno Sokal, Fazal-E-Asim, André L. F. de Almeida, Hongyu Li, Bruno Clerckx

Abstract:Beyond diagonal reconfigurable intelligent surface (BD-RIS) is a new architecture for RIS where elements are interconnected to provide more wave manipulation flexibility than traditional single connected RIS, enhancing data rate and coverage. However, channel estimation for BD-RIS is challenging due to the more complex multiple-connection structure involving their scattering elements. To address this issue, this paper proposes a decoupled channel estimation method for BD-RIS that yields separate estimates of the involved channels to enhance the accuracy of the overall combined channel by capitalizing on its Kronecker structure. Starting from a least squares estimate of the combined channel and by properly reshaping the resulting filtered signal, the proposed algorithm resorts to a Khatri-Rao Factorization (KRF) method that teases out the individual channels based on simple rank-one matrix approximation steps. Numerical results show that the proposed decoupled channel estimation yields more accurate channel estimates than the classical least squares scheme.

Via

Access Paper or Ask Questions

Non-reciprocal Beyond Diagonal RIS: Sum-Rate Maximization in Full-Duplex Communications

Dec 07, 2024

Ziang Liu, Hongyu Li, Bruno Clerckx

Figure 1 for Non-reciprocal Beyond Diagonal RIS: Sum-Rate Maximization in Full-Duplex Communications

Figure 2 for Non-reciprocal Beyond Diagonal RIS: Sum-Rate Maximization in Full-Duplex Communications

Figure 3 for Non-reciprocal Beyond Diagonal RIS: Sum-Rate Maximization in Full-Duplex Communications

Figure 4 for Non-reciprocal Beyond Diagonal RIS: Sum-Rate Maximization in Full-Duplex Communications

Abstract:Reconfigurable intelligent surface (RIS) has been envisioned as a key technology in future wireless communication networks to enable smart radio environment. To further enhance the passive beamforming capability of RIS, beyond diagonal (BD)-RIS has been proposed considering reconfigurable interconnections among different RIS elements. BD-RIS has a unique feature that cannot be enabled by conventional diagonal RIS; it can be realized by non-reciprocal circuits and thus enables an asymmetric scattering matrix. This feature provides the capability to break the wireless channel reciprocity, and has the potential to benefit full-duplex (FD) systems. In this paper, we model the BD RIS-assisted FD systems, where the impact of BD-RIS non-reciprocity and that of structural scattering, which refers to the specular reflection generated by RIS when the RIS is turned OFF, are explicitly captured. To assess the benefits of non-reciprocal BD-RIS, we optimise the scattering matrix, precoder and combiner to maximize the DL and UL sum-rates in the FD system. To tackle this optimization problem, we propose an iterative algorithm based on block coordination descent (BCD) and penalty dual decomposition (PDD). Numerical results demonstrate surprising benefits of non-reciprocal BD-RIS that it can achieve much higher DL and UL sum-rates in the FD scenario than reciprocal BD-RIS and conventional diagonal RIS.

* Submitted to IEEE journal

Via

Access Paper or Ask Questions