Picture for Quanzeng You

Quanzeng You

Law of Vision Representation in MLLMs

Add code
Aug 29, 2024
Viaarxiv icon

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

Add code
May 28, 2024
Viaarxiv icon

ViTAR: Vision Transformer with Any Resolution

Add code
Mar 28, 2024
Figure 1 for ViTAR: Vision Transformer with Any Resolution
Figure 2 for ViTAR: Vision Transformer with Any Resolution
Figure 3 for ViTAR: Vision Transformer with Any Resolution
Figure 4 for ViTAR: Vision Transformer with Any Resolution
Viaarxiv icon

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Add code
Mar 03, 2024
Figure 1 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 2 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 3 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 4 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Viaarxiv icon

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Add code
Jan 18, 2024
Figure 1 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 2 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 3 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 4 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Viaarxiv icon

COCO is "ALL'' You Need for Visual Instruction Fine-tuning

Add code
Jan 17, 2024
Viaarxiv icon

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Add code
Dec 04, 2023
Figure 1 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 2 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 3 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 4 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Viaarxiv icon

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

Add code
Dec 03, 2023
Figure 1 for Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts
Figure 2 for Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts
Figure 3 for Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts
Figure 4 for Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts
Viaarxiv icon

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

Add code
Nov 28, 2023
Viaarxiv icon

Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

Add code
Oct 10, 2023
Figure 1 for Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Figure 2 for Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Figure 3 for Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Figure 4 for Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Viaarxiv icon