Picture for Can Huang

Can Huang

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Add code
Dec 12, 2024
Viaarxiv icon

Grounding Natural Language to SQL Translation with Data-Based Self-Explanations

Add code
Nov 05, 2024
Viaarxiv icon

MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark

Add code
Oct 15, 2024
Viaarxiv icon

UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function

Add code
Aug 27, 2024
Figure 1 for UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function
Figure 2 for UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function
Figure 3 for UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function
Figure 4 for UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function
Viaarxiv icon

ParGo: Bridging Vision-Language with Partial and Global Views

Add code
Aug 23, 2024
Viaarxiv icon

Harmonizing Visual Text Comprehension and Generation

Add code
Jul 23, 2024
Figure 1 for Harmonizing Visual Text Comprehension and Generation
Figure 2 for Harmonizing Visual Text Comprehension and Generation
Figure 3 for Harmonizing Visual Text Comprehension and Generation
Figure 4 for Harmonizing Visual Text Comprehension and Generation
Viaarxiv icon

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

Add code
Jul 02, 2024
Viaarxiv icon

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Add code
Jun 03, 2024
Viaarxiv icon

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

Add code
May 20, 2024
Viaarxiv icon