Picture for Minghui Liao

Minghui Liao

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Add code
Oct 08, 2024
Viaarxiv icon

TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens

Add code
Oct 07, 2024
Viaarxiv icon

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Add code
Apr 14, 2024
Viaarxiv icon

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Add code
Mar 05, 2024
Viaarxiv icon

Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition

Add code
Feb 24, 2024
Viaarxiv icon

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

Add code
Feb 21, 2024
Viaarxiv icon

Joint Learning Neuronal Skeleton and Brain Circuit Topology with Permutation Invariant Encoders for Neuron Classification

Add code
Dec 22, 2023
Viaarxiv icon

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Add code
Aug 21, 2023
Viaarxiv icon

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

Add code
Jul 01, 2022
Figure 1 for Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Figure 2 for Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Figure 3 for Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Figure 4 for Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Viaarxiv icon

Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition

Add code
Mar 23, 2022
Figure 1 for Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition
Viaarxiv icon