Picture for Lianwen Jin

Lianwen Jin

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon

Explainable Tampered Text Detection via Multimodal Large Models

Add code
Dec 19, 2024
Viaarxiv icon

Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach

Add code
Dec 16, 2024
Viaarxiv icon

Predicting the Original Appearance of Damaged Historical Documents

Add code
Dec 16, 2024
Viaarxiv icon

Omni-IML: Towards Unified Image Manipulation Localization

Add code
Nov 22, 2024
Figure 1 for Omni-IML: Towards Unified Image Manipulation Localization
Figure 2 for Omni-IML: Towards Unified Image Manipulation Localization
Figure 3 for Omni-IML: Towards Unified Image Manipulation Localization
Figure 4 for Omni-IML: Towards Unified Image Manipulation Localization
Viaarxiv icon

VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models

Add code
Oct 01, 2024
Viaarxiv icon

DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Add code
Aug 27, 2024
Figure 1 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Figure 2 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Figure 3 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Figure 4 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Viaarxiv icon

Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models

Add code
Aug 09, 2024
Viaarxiv icon

LEGO: Self-Supervised Representation Learning for Scene Text Images

Add code
Aug 04, 2024
Viaarxiv icon

Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping

Add code
Aug 04, 2024
Viaarxiv icon