Picture for Yuliang Liu

Yuliang Liu

Privacy-Preserving Biometric Verification with Handwritten Random Digit String

Add code
Mar 17, 2025
Viaarxiv icon

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

Add code
Feb 22, 2025
Viaarxiv icon

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Add code
Feb 19, 2025
Viaarxiv icon

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon

Liquid: Language Models are Scalable Multi-modal Generators

Add code
Dec 05, 2024
Figure 1 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 2 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 3 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 4 for Liquid: Language Models are Scalable Multi-modal Generators
Viaarxiv icon

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Add code
Dec 03, 2024
Viaarxiv icon

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Add code
Oct 23, 2024
Viaarxiv icon

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Add code
Oct 08, 2024
Viaarxiv icon

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

Add code
Sep 04, 2024
Viaarxiv icon

Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models

Add code
Aug 09, 2024
Figure 1 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Figure 2 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Figure 3 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Figure 4 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Viaarxiv icon