Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Liang

Spatial-Temporal Multi-Scale Quantization for Flexible Motion Generation

Aug 12, 2025

Zan Wang, Jingze Zhang, Yixin Chen, Baoxiong Jia, Wei Liang, Siyuan Huang

Abstract:Despite significant advancements in human motion generation, current motion representations, typically formulated as discrete frame sequences, still face two critical limitations: (i) they fail to capture motion from a multi-scale perspective, limiting the capability in complex patterns modeling; (ii) they lack compositional flexibility, which is crucial for model's generalization in diverse generation tasks. To address these challenges, we introduce MSQ, a novel quantization method that compresses the motion sequence into multi-scale discrete tokens across spatial and temporal dimensions. MSQ employs distinct encoders to capture body parts at varying spatial granularities and temporally interpolates the encoded features into multiple scales before quantizing them into discrete tokens. Building on this representation, we establish a generative mask modeling model to effectively support motion editing, motion control, and conditional motion generation. Through quantitative and qualitative analysis, we show that our quantization method enables the seamless composition of motion tokens without requiring specialized design or re-training. Furthermore, extensive evaluations demonstrate that our approach outperforms existing baseline methods on various benchmarks.

* 18 pages

Via

Access Paper or Ask Questions

CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks

Jun 10, 2025

Yixuan Li, Yutang Lin, Jieming Cui, Tengyu Liu, Wei Liang, Yixin Zhu, Siyuan Huang

Abstract:Humanoid teleoperation plays a vital role in demonstrating and collecting data for complex humanoid-scene interactions. However, current teleoperation systems face critical limitations: they decouple upper- and lower-body control to maintain stability, restricting natural coordination, and operate open-loop without real-time position feedback, leading to accumulated drift. The fundamental challenge is achieving precise, coordinated whole-body teleoperation over extended durations while maintaining accurate global positioning. Here we show that an MoE-based teleoperation system, CLONE, with closed-loop error correction enables unprecedented whole-body teleoperation fidelity, maintaining minimal positional drift over long-range trajectories using only head and hand tracking from an MR headset. Unlike previous methods that either sacrifice coordination for stability or suffer from unbounded drift, CLONE learns diverse motion skills while preventing tracking error accumulation through real-time feedback, enabling complex coordinated movements such as ``picking up objects from the ground.'' These results establish a new milestone for whole-body humanoid teleoperation for long-horizon humanoid-scene interaction tasks.

* 18 pages, 13 figures

Via

Access Paper or Ask Questions

MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans

May 05, 2025

Huangyue Yu, Baoxiong Jia, Yixin Chen, Yandan Yang, Puhao Li, Rongpeng Su, Jiaxin Li, Qing Li, Wei Liang, Song-Chun Zhu(+2 more)

Abstract:Embodied AI (EAI) research requires high-quality, diverse 3D scenes to effectively support skill acquisition, sim-to-real transfer, and generalization. Achieving these quality standards, however, necessitates the precise replication of real-world object diversity. Existing datasets demonstrate that this process heavily relies on artist-driven designs, which demand substantial human effort and present significant scalability challenges. To scalably produce realistic and interactive 3D scenes, we first present MetaScenes, a large-scale, simulatable 3D scene dataset constructed from real-world scans, which includes 15366 objects spanning 831 fine-grained categories. Then, we introduce Scan2Sim, a robust multi-modal alignment model, which enables the automated, high-quality replacement of assets, thereby eliminating the reliance on artist-driven designs for scaling 3D scenes. We further propose two benchmarks to evaluate MetaScenes: a detailed scene synthesis task focused on small item layouts for robotic manipulation and a domain transfer task in vision-and-language navigation (VLN) to validate cross-domain transfer. Results confirm MetaScene's potential to enhance EAI by supporting more generalizable agent learning and sim-to-real applications, introducing new possibilities for EAI research. Project website: https://meta-scenes.github.io/.

* CVPR 2025

Via

Access Paper or Ask Questions

EMRModel: A Large Language Model for Extracting Medical Consultation Dialogues into Structured Medical Records

Apr 23, 2025

Shuguang Zhao, Qiangzhong Feng, Zhiyang He, Peipei Sun, Yingying Wang, Xiaodong Tao, Xiaoliang Lu, Mei Cheng, Xinyue Wu, Yanyan Wang(+1 more)

Abstract:Medical consultation dialogues contain critical clinical information, yet their unstructured nature hinders effective utilization in diagnosis and treatment. Traditional methods, relying on rule-based or shallow machine learning techniques, struggle to capture deep and implicit semantics. Recently, large pre-trained language models and Low-Rank Adaptation (LoRA), a lightweight fine-tuning method, have shown promise for structured information extraction. We propose EMRModel, a novel approach that integrates LoRA-based fine-tuning with code-style prompt design, aiming to efficiently convert medical consultation dialogues into structured electronic medical records (EMRs). Additionally, we construct a high-quality, realistically grounded dataset of medical consultation dialogues with detailed annotations. Furthermore, we introduce a fine-grained evaluation benchmark for medical consultation information extraction and provide a systematic evaluation methodology, advancing the optimization of medical natural language processing (NLP) models. Experimental results show EMRModel achieves an F1 score of 88.1%, improving by49.5% over standard pre-trained models. Compared to traditional LoRA fine-tuning methods, our model shows superior performance, highlighting its effectiveness in structured medical record extraction tasks.

Via

Access Paper or Ask Questions

Improved AFSA-Based Beam Training Without CSI for RIS-Assisted ISAC Systems

Apr 10, 2025

Yunxiang Shi, Lixin Li, Wensheng Lin, Wei Liang, Zhu Han

Abstract:In this paper, we consider transmit beamforming and reflection patterns design in reconfigurable intelligent surface (RIS)-assisted integrated sensing and communication (ISAC) systems, where the dual-function base station (DFBS) lacks channel state information (CSI). To address the high overhead of cascaded channel estimation, we propose an improved artificial fish swarm algorithm (AFSA) combined with a feedback-based joint active and passive beam training scheme. In this approach, we consider the interference caused by multipath user echo signals on target detection and propose a beamforming design method that balances both communication and sensing performance. Numerical simulations show that the proposed AFSA outperforms other optimization algorithms, particularly in its robustness against echo interference under different communication signal-to-noise ratio (SNR) constraints.

Via

Access Paper or Ask Questions

Learning to Plan with Personalized Preferences

Feb 02, 2025

Manjie Xu, Xinyi Yang, Wei Liang, Chi Zhang, Yixin Zhu

Abstract:Effective integration of AI agents into daily life requires them to understand and adapt to individual human preferences, particularly in collaborative roles. Although recent studies on embodied intelligence have advanced significantly, they typically adopt generalized approaches that overlook personal preferences in planning. We address this limitation by developing agents that not only learn preferences from few demonstrations but also learn to adapt their planning strategies based on these preferences. Our research leverages the observation that preferences, though implicitly expressed through minimal demonstrations, can generalize across diverse planning scenarios. To systematically evaluate this hypothesis, we introduce Preference-based Planning (PbP) benchmark, an embodied benchmark featuring hundreds of diverse preferences spanning from atomic actions to complex sequences. Our evaluation of SOTA methods reveals that while symbol-based approaches show promise in scalability, significant challenges remain in learning to generate and execute plans that satisfy personalized preferences. We further demonstrate that incorporating learned preferences as intermediate representations in planning significantly improves the agent's ability to construct personalized plans. These findings establish preferences as a valuable abstraction layer for adaptive planning, opening new directions for research in preference-guided plan generation and execution.

Via

Access Paper or Ask Questions

IPP-Net: A Generalizable Deep Neural Network Model for Indoor Pathloss Radio Map Prediction

Jan 11, 2025

Bin Feng, Meng Zheng, Wei Liang, Lei Zhang

Figure 1 for IPP-Net: A Generalizable Deep Neural Network Model for Indoor Pathloss Radio Map Prediction

Figure 2 for IPP-Net: A Generalizable Deep Neural Network Model for Indoor Pathloss Radio Map Prediction

Figure 3 for IPP-Net: A Generalizable Deep Neural Network Model for Indoor Pathloss Radio Map Prediction

Abstract:In this paper, we propose a generalizable deep neural network model for indoor pathloss radio map prediction (termed as IPP-Net). IPP-Net is based on a UNet architecture and learned from both large-scale ray tracing simulation data and a modified 3GPP indoor hotspot model. The performance of IPP-Net is evaluated in the First Indoor Pathloss Radio Map Prediction Challenge in ICASSP 2025. The evaluation results show that IPP-Net achieves a weighted root mean square error of 9.501 dB on three competition tasks and obtains the second overall ranking.

* 2 pages, 1 figure, Accepted to ICASSP 2025

Via

Access Paper or Ask Questions

FloNa: Floor Plan Guided Embodied Visual Navigation

Dec 24, 2024

Jiaxin Li, Weiqi Huang, Zan Wang, Wei Liang, Huijun Di, Feng Liu

Abstract:Humans naturally rely on floor plans to navigate in unfamiliar environments, as they are readily available, reliable, and provide rich geometrical guidance. However, existing visual navigation settings overlook this valuable prior knowledge, leading to limited efficiency and accuracy. To eliminate this gap, we introduce a novel navigation task: Floor Plan Visual Navigation (FloNa), the first attempt to incorporate floor plan into embodied visual navigation. While the floor plan offers significant advantages, two key challenges emerge: (1) handling the spatial inconsistency between the floor plan and the actual scene layout for collision-free navigation, and (2) aligning observed images with the floor plan sketch despite their distinct modalities. To address these challenges, we propose FloDiff, a novel diffusion policy framework incorporating a localization module to facilitate alignment between the current observation and the floor plan. We further collect $20k$ navigation episodes across $117$ scenes in the iGibson simulator to support the training and evaluation. Extensive experiments demonstrate the effectiveness and efficiency of our framework in unfamiliar scenes using floor plan knowledge. Project website: https://gauleejx.github.io/flona/.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

Security Enhancement of Quantum Communication in Space-Air-Ground Integrated Networks

Oct 22, 2024

Yixiao Zhang, Wei Liang, Lixin Li, Wensheng Lin

Abstract:This paper investigates a transmission scheme for enhancing quantum communication security, aimed at improving the security of space-air-ground integrated networks (SAGIN). Quantum teleportation achieves the transmission of quantum states through quantum channels. In simple terms, an unknown quantum state at one location can be reconstructed on a particle at another location. By combining classical Turbo coding with quantum Shor error-correcting codes, we propose a practical solution that ensures secure information transmission even in the presence of errors in both classical and quantum channels. To provide absolute security under SAGIN, we add a quantum secure direct communication (QSDC) protocol to the current system. Specifically, by accounting for the practical scenario of eavesdropping in quantum channels, the QSDC protocol utilizes virtual entangled pairs to detect the presence of eavesdroppers. Consequently, the overall scheme guarantees both the reliability and absolute security of communication.

Via

Access Paper or Ask Questions

An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Oct 12, 2024

Wei Liang, Yiting Zhang, Ji Zhang, Erica Cochran Hameen

Figure 1 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Figure 2 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Figure 3 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Figure 4 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Abstract:Ensuring thermal comfort is essential for the well-being and productivity of individuals in built environments. Of the various thermal comfort indicators, the mean radiant temperature (MRT) is very challenging to measure. Most common measurement methodologies are time-consuming and not user-friendly. To address this issue, this paper proposes a novel MRT measurement framework that uses visual simultaneous localization and mapping (SLAM) and semantic segmentation techniques. The proposed approach follows the rule of thumb of the traditional MRT calculation method using surface temperature and view factors. However, it employs visual SLAM and creates a 3D thermal point cloud with enriched surface temperature information. The framework then implements Grounded SAM, a new object detection and segmentation tool to extract features with distinct temperature profiles on building surfaces. The detailed segmentation of thermal features not only reduces potential errors in the calculation of the MRT but also provides an efficient reconstruction of the spatial MRT distribution in the indoor environment. We also validate the calculation results with the reference measurement methodology. This data-driven framework offers faster and more efficient MRT measurements and spatial mapping than conventional methods. It can enable the direct engagement of researchers and practitioners in MRT measurements and contribute to research on thermal comfort and radiant cooling and heating systems.

* Accepted by 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop

Via

Access Paper or Ask Questions