Picture for Zhibo Yang

Zhibo Yang

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Add code
Dec 03, 2024
Viaarxiv icon

HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction

Add code
Nov 02, 2024
Figure 1 for HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Figure 2 for HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Figure 3 for HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Figure 4 for HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Viaarxiv icon

VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer

Add code
Sep 18, 2024
Viaarxiv icon

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Add code
Aug 27, 2024
Viaarxiv icon

Look Hear: Gaze Prediction for Speech-directed Human Attention

Add code
Jul 28, 2024
Figure 1 for Look Hear: Gaze Prediction for Speech-directed Human Attention
Figure 2 for Look Hear: Gaze Prediction for Speech-directed Human Attention
Figure 3 for Look Hear: Gaze Prediction for Speech-directed Human Attention
Figure 4 for Look Hear: Gaze Prediction for Speech-directed Human Attention
Viaarxiv icon

Visual Text Generation in the Wild

Add code
Jul 19, 2024
Viaarxiv icon

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Add code
Mar 28, 2024
Viaarxiv icon

HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

Add code
Mar 20, 2024
Viaarxiv icon

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

Add code
Jan 03, 2024
Viaarxiv icon

Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

Add code
Aug 04, 2023
Viaarxiv icon