Picture for Tong He

Tong He

CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning for Large Scene Reconstruction

Add code
Dec 23, 2024
Viaarxiv icon

Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

Add code
Dec 19, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Viaarxiv icon

Factorized Visual Tokenization and Generation

Add code
Nov 25, 2024
Viaarxiv icon

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

Add code
Nov 21, 2024
Figure 1 for Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Figure 2 for Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Figure 3 for Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Figure 4 for Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Viaarxiv icon

DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild

Add code
Nov 20, 2024
Figure 1 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Figure 2 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Figure 3 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Figure 4 for DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
Viaarxiv icon

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

Add code
Oct 31, 2024
Figure 1 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Figure 2 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Figure 3 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Figure 4 for DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Viaarxiv icon

EMMA: End-to-End Multimodal Model for Autonomous Driving

Add code
Oct 30, 2024
Figure 1 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Figure 2 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Figure 3 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Figure 4 for EMMA: End-to-End Multimodal Model for Autonomous Driving
Viaarxiv icon

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction

Add code
Oct 24, 2024
Figure 1 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Figure 2 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Figure 3 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Figure 4 for Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Viaarxiv icon

Depth Any Video with Scalable Synthetic Data

Add code
Oct 14, 2024
Figure 1 for Depth Any Video with Scalable Synthetic Data
Figure 2 for Depth Any Video with Scalable Synthetic Data
Figure 3 for Depth Any Video with Scalable Synthetic Data
Figure 4 for Depth Any Video with Scalable Synthetic Data
Viaarxiv icon