Picture for Yuzheng Zhuang

Yuzheng Zhuang

Structured Preference Optimization for Vision-Language Long-Horizon Task Planning

Add code
Feb 28, 2025
Viaarxiv icon

Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation

Add code
Feb 20, 2025
Viaarxiv icon

3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow

Add code
Jan 28, 2025
Figure 1 for 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow
Figure 2 for 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow
Figure 3 for 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow
Figure 4 for 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow
Viaarxiv icon

SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning

Add code
Jan 17, 2025
Viaarxiv icon

Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning

Add code
Aug 02, 2024
Viaarxiv icon

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Add code
Jun 28, 2024
Figure 1 for ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
Figure 2 for ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
Figure 3 for ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
Figure 4 for ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
Viaarxiv icon

A Survey on Vision-Language-Action Models for Embodied AI

Add code
May 23, 2024
Figure 1 for A Survey on Vision-Language-Action Models for Embodied AI
Figure 2 for A Survey on Vision-Language-Action Models for Embodied AI
Figure 3 for A Survey on Vision-Language-Action Models for Embodied AI
Figure 4 for A Survey on Vision-Language-Action Models for Embodied AI
Viaarxiv icon

SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation

Add code
Apr 16, 2024
Viaarxiv icon

Articulated Object Manipulation with Coarse-to-fine Affordance for Mitigating the Effect of Point Cloud Noise

Add code
Mar 07, 2024
Viaarxiv icon

VOLTA: Diverse and Controllable Question-Answer Pair Generation with Variational Mutual Information Maximizing Autoencoder

Add code
Jul 03, 2023
Viaarxiv icon