Picture for Jihao Wu

Jihao Wu

TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens

Add code
Oct 07, 2024
Viaarxiv icon

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Add code
Apr 14, 2024
Viaarxiv icon

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Add code
Mar 05, 2024
Viaarxiv icon

Temporal-Spatial Entropy Balancing for Causal Continuous Treatment-Effect Estimation

Add code
Dec 19, 2023
Viaarxiv icon

DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF

Add code
Oct 27, 2023
Viaarxiv icon

Efficient Image Captioning for Edge Devices

Add code
Dec 18, 2022
Viaarxiv icon

Controllable Image Captioning via Prompting

Add code
Dec 04, 2022
Viaarxiv icon