Picture for Jiwen Zhang

Jiwen Zhang

TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens

Add code
Oct 07, 2024
Viaarxiv icon

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

Add code
May 28, 2024
Viaarxiv icon

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

Add code
Apr 02, 2024
Viaarxiv icon

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Add code
Mar 05, 2024
Viaarxiv icon

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks

Add code
Oct 17, 2023
Viaarxiv icon

Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making

Add code
Jul 16, 2023
Viaarxiv icon

Robotic Assembly Control Reconfiguration Based on Transfer Reinforcement Learning for Objects with Different Geometric Features

Add code
Nov 04, 2022
Viaarxiv icon

Local Connection Reinforcement Learning Method for Efficient Control of Robotic Peg-in-Hole Assembly

Add code
Oct 24, 2022
Viaarxiv icon

Curriculum Learning for Vision-and-Language Navigation

Add code
Nov 14, 2021
Figure 1 for Curriculum Learning for Vision-and-Language Navigation
Figure 2 for Curriculum Learning for Vision-and-Language Navigation
Figure 3 for Curriculum Learning for Vision-and-Language Navigation
Figure 4 for Curriculum Learning for Vision-and-Language Navigation
Viaarxiv icon