Picture for Hao Fei

Hao Fei

Semantic Role Labeling: A Systematical Survey

Add code
Feb 09, 2025
Viaarxiv icon

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

Add code
Jan 28, 2025
Viaarxiv icon

Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework

Add code
Dec 22, 2024
Viaarxiv icon

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Add code
Dec 17, 2024
Viaarxiv icon

Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning

Add code
Dec 15, 2024
Viaarxiv icon

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Add code
Dec 13, 2024
Figure 1 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 2 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 3 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 4 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Viaarxiv icon

M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction

Add code
Dec 05, 2024
Figure 1 for M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction
Figure 2 for M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction
Figure 3 for M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction
Figure 4 for M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction
Viaarxiv icon

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Add code
Dec 03, 2024
Figure 1 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 2 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 3 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 4 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Viaarxiv icon

Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark

Add code
Dec 03, 2024
Figure 1 for Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark
Figure 2 for Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark
Figure 3 for Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark
Figure 4 for Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark
Viaarxiv icon

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Add code
Nov 05, 2024
Figure 1 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 2 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 3 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 4 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Viaarxiv icon