Picture for Biao Yang

Biao Yang

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon

AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs

Add code
Jul 16, 2024
Viaarxiv icon

Exploring the Capabilities of Large Multimodal Models on Dense Text

Add code
May 09, 2024
Figure 1 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Figure 2 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Figure 3 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Figure 4 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Viaarxiv icon

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

Add code
Mar 15, 2024
Viaarxiv icon

Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition

Add code
Feb 24, 2024
Viaarxiv icon

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

Add code
Feb 21, 2024
Viaarxiv icon

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Add code
Nov 24, 2023
Figure 1 for Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Figure 2 for Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Figure 3 for Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Figure 4 for Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Viaarxiv icon

Looking and Listening: Audio Guided Text Recognition

Add code
Jun 06, 2023
Figure 1 for Looking and Listening: Audio Guided Text Recognition
Figure 2 for Looking and Listening: Audio Guided Text Recognition
Figure 3 for Looking and Listening: Audio Guided Text Recognition
Figure 4 for Looking and Listening: Audio Guided Text Recognition
Viaarxiv icon

Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data

Add code
Feb 10, 2023
Viaarxiv icon

Searching Intrinsic Dimensions of Vision Transformers

Add code
Apr 16, 2022
Figure 1 for Searching Intrinsic Dimensions of Vision Transformers
Figure 2 for Searching Intrinsic Dimensions of Vision Transformers
Viaarxiv icon