Picture for Zhang Li

Zhang Li

CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration

Add code
Apr 07, 2026
Viaarxiv icon

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Add code
Mar 30, 2026
Viaarxiv icon

Ridge Estimation-Based Vision and Laser Ranging Fusion Localization Method for UAVs

Add code
Dec 18, 2025
Figure 1 for Ridge Estimation-Based Vision and Laser Ranging Fusion Localization Method for UAVs
Figure 2 for Ridge Estimation-Based Vision and Laser Ranging Fusion Localization Method for UAVs
Figure 3 for Ridge Estimation-Based Vision and Laser Ranging Fusion Localization Method for UAVs
Figure 4 for Ridge Estimation-Based Vision and Laser Ranging Fusion Localization Method for UAVs
Viaarxiv icon

MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns

Add code
Nov 16, 2025
Viaarxiv icon

MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling

Add code
Jun 12, 2025
Viaarxiv icon

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

Add code
Jun 05, 2025
Viaarxiv icon

High-precision visual navigation device calibration method based on collimator

Add code
Feb 25, 2025
Figure 1 for High-precision visual navigation device calibration method based on collimator
Figure 2 for High-precision visual navigation device calibration method based on collimator
Figure 3 for High-precision visual navigation device calibration method based on collimator
Figure 4 for High-precision visual navigation device calibration method based on collimator
Viaarxiv icon

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon

Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned

Add code
Jul 22, 2024
Figure 1 for Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned
Figure 2 for Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned
Figure 3 for Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned
Figure 4 for Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned
Viaarxiv icon

Exploring the Capabilities of Large Multimodal Models on Dense Text

Add code
May 09, 2024
Figure 1 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Figure 2 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Figure 3 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Figure 4 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Viaarxiv icon