Picture for Haotian Zhang

Haotian Zhang

Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation

Add code
Dec 08, 2024
Viaarxiv icon

Generalized Gaussian Model for Learned Image Compression

Add code
Nov 28, 2024
Viaarxiv icon

Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence

Add code
Nov 15, 2024
Figure 1 for Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence
Figure 2 for Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence
Figure 3 for Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence
Figure 4 for Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence
Viaarxiv icon

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Add code
Oct 24, 2024
Viaarxiv icon

Improve Vision Language Model Chain-of-thought Reasoning

Add code
Oct 21, 2024
Figure 1 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 2 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 3 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 4 for Improve Vision Language Model Chain-of-thought Reasoning
Viaarxiv icon

MM-Ego: Towards Building Egocentric Multimodal LLMs

Add code
Oct 09, 2024
Figure 1 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 2 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 3 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 4 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Viaarxiv icon

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Add code
Oct 03, 2024
Viaarxiv icon

Contrastive Localized Language-Image Pre-Training

Add code
Oct 03, 2024
Viaarxiv icon

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Add code
Sep 30, 2024
Viaarxiv icon

TemporalPaD: a reinforcement-learning framework for temporal feature representation and dimension reduction

Add code
Sep 27, 2024
Viaarxiv icon