Picture for Rui Qian

Rui Qian

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Add code
Dec 12, 2024
Viaarxiv icon

SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images

Add code
Dec 06, 2024
Viaarxiv icon

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Add code
Oct 21, 2024
Viaarxiv icon

Imagen 3

Add code
Aug 13, 2024
Viaarxiv icon

Rethinking Image-to-Video Adaptation: An Object-centric Perspective

Add code
Jul 09, 2024
Viaarxiv icon

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Add code
Jul 03, 2024
Figure 1 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 2 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 3 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 4 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Viaarxiv icon

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

Add code
May 27, 2024
Figure 1 for Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
Figure 2 for Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
Figure 3 for Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
Figure 4 for Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
Viaarxiv icon

Streaming Long Video Understanding with Large Language Models

Add code
May 25, 2024
Figure 1 for Streaming Long Video Understanding with Large Language Models
Figure 2 for Streaming Long Video Understanding with Large Language Models
Figure 3 for Streaming Long Video Understanding with Large Language Models
Figure 4 for Streaming Long Video Understanding with Large Language Models
Viaarxiv icon

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

Add code
Feb 27, 2024
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon