Picture for Yicong Hong

Yicong Hong

VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation

Add code
Mar 19, 2025
Viaarxiv icon

REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder

Add code
Mar 11, 2025
Viaarxiv icon

Pushing the Boundaries of State Space Models for Image and Video Generation

Add code
Feb 03, 2025
Figure 1 for Pushing the Boundaries of State Space Models for Image and Video Generation
Figure 2 for Pushing the Boundaries of State Space Models for Image and Video Generation
Figure 3 for Pushing the Boundaries of State Space Models for Image and Video Generation
Figure 4 for Pushing the Boundaries of State Space Models for Image and Video Generation
Viaarxiv icon

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Add code
Dec 11, 2024
Viaarxiv icon

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Add code
Dec 07, 2024
Viaarxiv icon

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Add code
Oct 16, 2024
Figure 1 for Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
Figure 2 for Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
Figure 3 for Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
Figure 4 for Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
Viaarxiv icon

Progressive Autoregressive Video Diffusion Models

Add code
Oct 10, 2024
Viaarxiv icon

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Add code
Jul 17, 2024
Viaarxiv icon

Augmented Commonsense Knowledge for Remote Object Grounding

Add code
Jun 03, 2024
Figure 1 for Augmented Commonsense Knowledge for Remote Object Grounding
Figure 2 for Augmented Commonsense Knowledge for Remote Object Grounding
Figure 3 for Augmented Commonsense Knowledge for Remote Object Grounding
Figure 4 for Augmented Commonsense Knowledge for Remote Object Grounding
Viaarxiv icon

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Add code
Mar 01, 2024
Figure 1 for NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
Figure 2 for NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
Figure 3 for NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
Figure 4 for NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
Viaarxiv icon