Picture for Difei Gao

Difei Gao

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Add code
Nov 15, 2024
Viaarxiv icon

Learning Video Context as Interleaved Multimodal Sequences

Add code
Jul 31, 2024
Viaarxiv icon

GUI Action Narrator: Where and When Did That Action Take Place?

Add code
Jun 19, 2024
Figure 1 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 2 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 3 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 4 for GUI Action Narrator: Where and When Did That Action Take Place?
Viaarxiv icon

VideoLLM-online: Online Video Large Language Model for Streaming Video

Add code
Jun 17, 2024
Viaarxiv icon

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Add code
Jun 14, 2024
Figure 1 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 2 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 3 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 4 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Viaarxiv icon

LOVA3: Learning to Visual Question Answering, Asking and Assessment

Add code
May 23, 2024
Viaarxiv icon

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

Add code
Jan 24, 2024
Viaarxiv icon

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

Add code
Jan 01, 2024
Viaarxiv icon

ViT-Lens-2: Gateway to Omni-modal Intelligence

Add code
Nov 27, 2023
Figure 1 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Figure 2 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Figure 3 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Figure 4 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Viaarxiv icon

CVPR 2023 Text Guided Video Editing Competition

Add code
Oct 24, 2023
Viaarxiv icon