Picture for Zilong Huang

Zilong Huang

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

Add code
Apr 03, 2025
Viaarxiv icon

4th PVUW MeViS 3rd Place Report: Sa2VA

Add code
Apr 01, 2025
Viaarxiv icon

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration

Add code
Apr 01, 2025
Viaarxiv icon

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Add code
Jan 21, 2025
Viaarxiv icon

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Add code
Jan 07, 2025
Figure 1 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 2 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 3 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 4 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Viaarxiv icon

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Add code
Oct 13, 2024
Figure 1 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 2 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 3 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 4 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Viaarxiv icon

CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

Add code
Aug 27, 2024
Figure 1 for CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
Figure 2 for CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
Figure 3 for CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
Figure 4 for CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
Viaarxiv icon

Lightweight Model Pre-training via Language Guided Knowledge Distillation

Add code
Jun 17, 2024
Viaarxiv icon

Depth Anything V2

Add code
Jun 13, 2024
Figure 1 for Depth Anything V2
Figure 2 for Depth Anything V2
Figure 3 for Depth Anything V2
Figure 4 for Depth Anything V2
Viaarxiv icon

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Add code
May 28, 2024
Figure 1 for DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Figure 2 for DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Figure 3 for DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Figure 4 for DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Viaarxiv icon