Picture for Chenyang Gu

Chenyang Gu

RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence

Add code
Dec 31, 2025
Viaarxiv icon

Scaling Spatial Intelligence with Multimodal Foundation Models

Add code
Nov 17, 2025
Figure 1 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 2 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 3 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 4 for Scaling Spatial Intelligence with Multimodal Foundation Models
Viaarxiv icon

Causal Inspired Multi Modal Recommendation

Add code
Oct 14, 2025
Figure 1 for Causal Inspired Multi Modal Recommendation
Figure 2 for Causal Inspired Multi Modal Recommendation
Figure 3 for Causal Inspired Multi Modal Recommendation
Figure 4 for Causal Inspired Multi Modal Recommendation
Viaarxiv icon

MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation

Add code
Sep 30, 2025
Figure 1 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Figure 2 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Figure 3 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Figure 4 for MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Viaarxiv icon

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

Add code
Aug 18, 2025
Viaarxiv icon

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Add code
Jul 02, 2025
Figure 1 for AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Figure 2 for AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Figure 3 for AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Figure 4 for AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Viaarxiv icon

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

Add code
Mar 13, 2025
Viaarxiv icon

SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation

Add code
Jan 28, 2025
Figure 1 for SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation
Figure 2 for SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation
Figure 3 for SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation
Figure 4 for SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation
Viaarxiv icon

RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

Add code
Dec 18, 2024
Figure 1 for RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Figure 2 for RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Figure 3 for RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Figure 4 for RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Viaarxiv icon

Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

Add code
Nov 27, 2024
Figure 1 for Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Figure 2 for Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Figure 3 for Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Figure 4 for Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Viaarxiv icon