Picture for Tong He

Tong He

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Viaarxiv icon

Factorized Visual Tokenization and Generation

Add code
Nov 25, 2024
Viaarxiv icon

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

Add code
Nov 21, 2024
Viaarxiv icon

DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild

Add code
Nov 20, 2024
Figure 1 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Figure 2 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Figure 3 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Figure 4 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Viaarxiv icon

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

Add code
Oct 31, 2024
Figure 1 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Figure 2 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Figure 3 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Figure 4 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Viaarxiv icon

EMMA: End-to-End Multimodal Model for Autonomous Driving

Add code
Oct 30, 2024
Figure 1 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Figure 2 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Figure 3 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Figure 4 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Viaarxiv icon

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction

Add code
Oct 24, 2024
Figure 1 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Figure 2 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Figure 3 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Figure 4 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Viaarxiv icon

Depth Any Video with Scalable Synthetic Data

Add code
Oct 14, 2024
Figure 1 for Depth Any Video with Scalable Synthetic Data
Figure 2 for Depth Any Video with Scalable Synthetic Data
Figure 3 for Depth Any Video with Scalable Synthetic Data
Figure 4 for Depth Any Video with Scalable Synthetic Data
Viaarxiv icon

VideoSAM: Open-World Video Segmentation

Add code
Oct 11, 2024
Figure 1 for VideoSAM: Open-World Video Segmentation
Figure 2 for VideoSAM: Open-World Video Segmentation
Figure 3 for VideoSAM: Open-World Video Segmentation
Figure 4 for VideoSAM: Open-World Video Segmentation
Viaarxiv icon

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Add code
Oct 10, 2024
Figure 1 for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Figure 2 for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Figure 3 for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Figure 4 for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Viaarxiv icon