Picture for Tong He

Tong He

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

Add code
Oct 31, 2024
Viaarxiv icon

EMMA: End-to-End Multimodal Model for Autonomous Driving

Add code
Oct 30, 2024
Viaarxiv icon

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction

Add code
Oct 24, 2024
Viaarxiv icon

Depth Any Video with Scalable Synthetic Data

Add code
Oct 14, 2024
Figure 1 for Depth Any Video with Scalable Synthetic Data
Figure 2 for Depth Any Video with Scalable Synthetic Data
Figure 3 for Depth Any Video with Scalable Synthetic Data
Figure 4 for Depth Any Video with Scalable Synthetic Data
Viaarxiv icon

VideoSAM: Open-World Video Segmentation

Add code
Oct 11, 2024
Figure 1 for VideoSAM: Open-World Video Segmentation
Figure 2 for VideoSAM: Open-World Video Segmentation
Figure 3 for VideoSAM: Open-World Video Segmentation
Figure 4 for VideoSAM: Open-World Video Segmentation
Viaarxiv icon

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Add code
Oct 10, 2024
Viaarxiv icon

StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting

Add code
Oct 06, 2024
Viaarxiv icon

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Add code
Sep 29, 2024
Viaarxiv icon

GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction

Add code
Sep 10, 2024
Viaarxiv icon

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation

Add code
Sep 07, 2024
Viaarxiv icon