Picture for Yiheng Xu

Yiheng Xu

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Add code
Dec 12, 2024
Viaarxiv icon

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Add code
Dec 05, 2024
Viaarxiv icon

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Add code
Apr 11, 2024
Figure 1 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 2 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 3 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 4 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Viaarxiv icon

OpenAgents: An Open Platform for Language Agents in the Wild

Add code
Oct 16, 2023
Viaarxiv icon

Lemur: Harmonizing Natural Language and Code for Language Agents

Add code
Oct 10, 2023
Viaarxiv icon

In-Context Learning with Many Demonstration Examples

Add code
Feb 09, 2023
Viaarxiv icon

DiT: Self-supervised Pre-training for Document Image Transformer

Add code
Apr 12, 2022
Figure 1 for DiT: Self-supervised Pre-training for Document Image Transformer
Figure 2 for DiT: Self-supervised Pre-training for Document Image Transformer
Figure 3 for DiT: Self-supervised Pre-training for Document Image Transformer
Figure 4 for DiT: Self-supervised Pre-training for Document Image Transformer
Viaarxiv icon

Document AI: Benchmarks, Models and Applications

Add code
Nov 16, 2021
Figure 1 for Document AI: Benchmarks, Models and Applications
Figure 2 for Document AI: Benchmarks, Models and Applications
Figure 3 for Document AI: Benchmarks, Models and Applications
Figure 4 for Document AI: Benchmarks, Models and Applications
Viaarxiv icon

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

Add code
Oct 16, 2021
Figure 1 for MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Figure 2 for MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Figure 3 for MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Figure 4 for MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Viaarxiv icon

LayoutReader: Pre-training of Text and Layout for Reading Order Detection

Add code
Aug 27, 2021
Figure 1 for LayoutReader: Pre-training of Text and Layout for Reading Order Detection
Figure 2 for LayoutReader: Pre-training of Text and Layout for Reading Order Detection
Figure 3 for LayoutReader: Pre-training of Text and Layout for Reading Order Detection
Figure 4 for LayoutReader: Pre-training of Text and Layout for Reading Order Detection
Viaarxiv icon