Picture for Kevin Qinghong Lin

Kevin Qinghong Lin

ROICtrl: Boosting Instance Control for Visual Generation

Add code
Nov 27, 2024
Figure 1 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 2 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 3 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 4 for ROICtrl: Boosting Instance Control for Visual Generation
Viaarxiv icon

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Add code
Nov 26, 2024
Figure 1 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 2 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 3 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 4 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Viaarxiv icon

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation

Add code
Nov 22, 2024
Figure 1 for MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Figure 2 for MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Figure 3 for MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Figure 4 for MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Viaarxiv icon

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

Add code
Aug 29, 2024
Figure 1 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 2 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 3 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 4 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Viaarxiv icon

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Add code
Aug 22, 2024
Figure 1 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 2 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 3 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 4 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Viaarxiv icon

Learning Video Context as Interleaved Multimodal Sequences

Add code
Jul 31, 2024
Figure 1 for Learning Video Context as Interleaved Multimodal Sequences
Figure 2 for Learning Video Context as Interleaved Multimodal Sequences
Figure 3 for Learning Video Context as Interleaved Multimodal Sequences
Figure 4 for Learning Video Context as Interleaved Multimodal Sequences
Viaarxiv icon

GUI Action Narrator: Where and When Did That Action Take Place?

Add code
Jun 19, 2024
Figure 1 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 2 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 3 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 4 for GUI Action Narrator: Where and When Did That Action Take Place?
Viaarxiv icon

VideoLLM-online: Online Video Large Language Model for Streaming Video

Add code
Jun 17, 2024
Figure 1 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Figure 2 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Figure 3 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Figure 4 for VideoLLM-online: Online Video Large Language Model for Streaming Video
Viaarxiv icon

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Add code
Jun 14, 2024
Figure 1 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 2 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 3 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 4 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Viaarxiv icon

Learning Long-form Video Prior via Generative Pre-Training

Add code
Apr 24, 2024
Figure 1 for Learning Long-form Video Prior via Generative Pre-Training
Figure 2 for Learning Long-form Video Prior via Generative Pre-Training
Figure 3 for Learning Long-form Video Prior via Generative Pre-Training
Figure 4 for Learning Long-form Video Prior via Generative Pre-Training
Viaarxiv icon