Picture for Kevin Qinghong Lin

Kevin Qinghong Lin

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

Add code
Aug 29, 2024
Figure 1 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 2 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 3 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 4 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Viaarxiv icon

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Add code
Aug 22, 2024
Figure 1 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 2 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 3 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 4 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Viaarxiv icon

Learning Video Context as Interleaved Multimodal Sequences

Add code
Jul 31, 2024
Viaarxiv icon

GUI Action Narrator: Where and When Did That Action Take Place?

Add code
Jun 19, 2024
Figure 1 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 2 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 3 for GUI Action Narrator: Where and When Did That Action Take Place?
Figure 4 for GUI Action Narrator: Where and When Did That Action Take Place?
Viaarxiv icon

VideoLLM-online: Online Video Large Language Model for Streaming Video

Add code
Jun 17, 2024
Viaarxiv icon

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Add code
Jun 14, 2024
Figure 1 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 2 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 3 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 4 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Viaarxiv icon

Learning Long-form Video Prior via Generative Pre-Training

Add code
Apr 24, 2024
Figure 1 for Learning Long-form Video Prior via Generative Pre-Training
Figure 2 for Learning Long-form Video Prior via Generative Pre-Training
Figure 3 for Learning Long-form Video Prior via Generative Pre-Training
Figure 4 for Learning Long-form Video Prior via Generative Pre-Training
Viaarxiv icon

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Add code
Jan 01, 2024
Viaarxiv icon

Bootstrapping SparseFormers from Vision Foundation Models

Add code
Dec 04, 2023
Viaarxiv icon

DiffusionVMR: Diffusion Model for Video Moment Retrieval

Add code
Aug 29, 2023
Viaarxiv icon