Picture for Difei Gao

Difei Gao

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Add code
Nov 26, 2024
Figure 1 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 2 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 3 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 4 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Viaarxiv icon

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Add code
Nov 15, 2024
Figure 1 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Figure 2 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Figure 3 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Figure 4 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Viaarxiv icon

Learning Video Context as Interleaved Multimodal Sequences

Add code
Jul 31, 2024
Figure 1 for Learning Video Context as Interleaved Multimodal Sequences
Figure 2 for Learning Video Context as Interleaved Multimodal Sequences
Figure 3 for Learning Video Context as Interleaved Multimodal Sequences
Figure 4 for Learning Video Context as Interleaved Multimodal Sequences
Viaarxiv icon

GUI Action Narrator: Where and When Did That Action Take Place?

Add code
Jun 19, 2024
Figure 1 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 2 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 3 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 4 for GUI Action Narrator: Where and When Did That Action Take Place?
Viaarxiv icon

VideoLLM-online: Online Video Large Language Model for Streaming Video

Add code
Jun 17, 2024
Figure 1 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Figure 2 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Figure 3 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Figure 4 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Viaarxiv icon

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Add code
Jun 14, 2024
Figure 1 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 2 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 3 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 4 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Viaarxiv icon

LOVA3: Learning to Visual Question Answering, Asking and Assessment

Add code
May 23, 2024
Viaarxiv icon

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

Add code
Jan 24, 2024
Viaarxiv icon

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

Add code
Jan 01, 2024
Viaarxiv icon

ViT-Lens-2: Gateway to Omni-modal Intelligence

Add code
Nov 27, 2023
Figure 1 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Figure 2 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Figure 3 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Figure 4 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Viaarxiv icon