Picture for Zhiding Yu

Zhiding Yu

StreamChat: Chatting with Streaming Video

Add code
Dec 11, 2024
Viaarxiv icon

Scene Flow as a Partial Differential Equation

Add code
Oct 02, 2024
Figure 1 for Scene Flow as a Partial Differential Equation
Figure 2 for Scene Flow as a Partial Differential Equation
Figure 3 for Scene Flow as a Partial Differential Equation
Figure 4 for Scene Flow as a Partial Differential Equation
Viaarxiv icon

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Add code
Aug 28, 2024
Figure 1 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 2 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 3 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 4 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Viaarxiv icon

Exploring Camera Encoder Designs for Autonomous Driving Perception

Add code
Jul 09, 2024
Viaarxiv icon

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Add code
Jun 11, 2024
Figure 1 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Figure 2 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Figure 3 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Figure 4 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Viaarxiv icon

X-VILA: Cross-Modality Alignment for Large Language Model

Add code
May 29, 2024
Figure 1 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 2 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 3 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 4 for X-VILA: Cross-Modality Alignment for Large Language Model
Viaarxiv icon

Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Add code
May 29, 2024
Viaarxiv icon

OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

Add code
May 02, 2024
Figure 1 for OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
Figure 2 for OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
Figure 3 for OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
Figure 4 for OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
Viaarxiv icon

What is Point Supervision Worth in Video Instance Segmentation?

Add code
Apr 01, 2024
Viaarxiv icon

LITA: Language Instructed Temporal-Localization Assistant

Add code
Mar 27, 2024
Viaarxiv icon