Picture for Hao Fei

Hao Fei

Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning

Add code
Dec 15, 2024
Viaarxiv icon

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Add code
Dec 13, 2024
Figure 1 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 2 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 3 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 4 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Viaarxiv icon

M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction

Add code
Dec 05, 2024
Viaarxiv icon

Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark

Add code
Dec 03, 2024
Viaarxiv icon

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Add code
Dec 03, 2024
Viaarxiv icon

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Add code
Nov 05, 2024
Figure 1 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 2 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 3 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 4 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Viaarxiv icon

Unified Generative and Discriminative Training for Multi-modal Large Language Models

Add code
Nov 01, 2024
Figure 1 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Figure 2 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Figure 3 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Figure 4 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Viaarxiv icon

What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Add code
Oct 27, 2024
Figure 1 for What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration
Figure 2 for What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration
Figure 3 for What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration
Figure 4 for What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration
Viaarxiv icon

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image

Add code
Oct 20, 2024
Viaarxiv icon

A Survey of Ontology Expansion for Conversational Understanding

Add code
Oct 19, 2024
Viaarxiv icon