Picture for Luhui Hu

Luhui Hu

Spatially Visual Perception for End-to-End Robotic Learning

Add code
Nov 26, 2024
Figure 1 for Spatially Visual Perception for End-to-End Robotic Learning
Figure 2 for Spatially Visual Perception for End-to-End Robotic Learning
Figure 3 for Spatially Visual Perception for End-to-End Robotic Learning
Figure 4 for Spatially Visual Perception for End-to-End Robotic Learning
Viaarxiv icon

Generalized Robot Learning Framework

Add code
Sep 18, 2024
Figure 1 for Generalized Robot Learning Framework
Figure 2 for Generalized Robot Learning Framework
Figure 3 for Generalized Robot Learning Framework
Figure 4 for Generalized Robot Learning Framework
Viaarxiv icon

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

Add code
Mar 22, 2024
Viaarxiv icon

VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework

Add code
Mar 14, 2024
Figure 1 for VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Figure 2 for VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Figure 3 for VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Figure 4 for VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Viaarxiv icon

WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs

Add code
Mar 10, 2024
Figure 1 for WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Figure 2 for WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Figure 3 for WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Figure 4 for WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Viaarxiv icon

UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework

Add code
Nov 16, 2023
Viaarxiv icon