Picture for Shiwei Wu

Shiwei Wu

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Add code
Nov 26, 2024
Figure 1 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 2 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 3 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 4 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Viaarxiv icon

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

Add code
Aug 29, 2024
Figure 1 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 2 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 3 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 4 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Viaarxiv icon

Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization

Add code
Aug 06, 2024
Viaarxiv icon

VideoLLM-online: Online Video Large Language Model for Streaming Video

Add code
Jun 17, 2024
Figure 1 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Figure 2 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Figure 3 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Figure 4 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Viaarxiv icon

From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

Add code
Jun 12, 2024
Figure 1 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Figure 2 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Figure 3 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Figure 4 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Viaarxiv icon

NoteLLM-2: Multimodal Large Representation Models for Recommendation

Add code
May 27, 2024
Viaarxiv icon

NoteLLM: A Retrievable Large Language Model for Note Recommendation

Add code
Mar 04, 2024
Figure 1 for NoteLLM: A Retrievable Large Language Model for Note Recommendation
Figure 2 for NoteLLM: A Retrievable Large Language Model for Note Recommendation
Figure 3 for NoteLLM: A Retrievable Large Language Model for Note Recommendation
Figure 4 for NoteLLM: A Retrievable Large Language Model for Note Recommendation
Viaarxiv icon

Communication-Efficient Distributed Learning with Local Immediate Error Compensation

Add code
Feb 19, 2024
Figure 1 for Communication-Efficient Distributed Learning with Local Immediate Error Compensation
Figure 2 for Communication-Efficient Distributed Learning with Local Immediate Error Compensation
Figure 3 for Communication-Efficient Distributed Learning with Local Immediate Error Compensation
Figure 4 for Communication-Efficient Distributed Learning with Local Immediate Error Compensation
Viaarxiv icon

Multi-Grained Multimodal Interaction Network for Entity Linking

Add code
Jul 19, 2023
Figure 1 for Multi-Grained Multimodal Interaction Network for Entity Linking
Figure 2 for Multi-Grained Multimodal Interaction Network for Entity Linking
Figure 3 for Multi-Grained Multimodal Interaction Network for Entity Linking
Figure 4 for Multi-Grained Multimodal Interaction Network for Entity Linking
Viaarxiv icon

A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

Add code
Jun 26, 2023
Viaarxiv icon