Picture for Haotian Zhang

Haotian Zhang

Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence

Add code
Nov 15, 2024
Viaarxiv icon

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Add code
Oct 24, 2024
Viaarxiv icon

Improve Vision Language Model Chain-of-thought Reasoning

Add code
Oct 21, 2024
Figure 1 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 2 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 3 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 4 for Improve Vision Language Model Chain-of-thought Reasoning
Viaarxiv icon

MM-Ego: Towards Building Egocentric Multimodal LLMs

Add code
Oct 09, 2024
Figure 1 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 2 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 3 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 4 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Viaarxiv icon

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Add code
Oct 03, 2024
Viaarxiv icon

Contrastive Localized Language-Image Pre-Training

Add code
Oct 03, 2024
Viaarxiv icon

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Add code
Sep 30, 2024
Viaarxiv icon

TemporalPaD: a reinforcement-learning framework for temporal feature representation and dimension reduction

Add code
Sep 27, 2024
Viaarxiv icon

Few-Shot Domain Adaptation for Learned Image Compression

Add code
Sep 17, 2024
Viaarxiv icon

Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue

Add code
Sep 16, 2024
Viaarxiv icon