Picture for Shuangrui Ding

Shuangrui Ding

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Add code
Jan 09, 2025
Viaarxiv icon

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Add code
Jan 06, 2025
Figure 1 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Figure 2 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Figure 3 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Figure 4 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Viaarxiv icon

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Add code
Dec 12, 2024
Figure 1 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 2 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 3 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 4 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Viaarxiv icon

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Add code
Oct 21, 2024
Viaarxiv icon

Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation

Add code
Jul 13, 2024
Viaarxiv icon

Rethinking Image-to-Video Adaptation: An Object-centric Perspective

Add code
Jul 09, 2024
Viaarxiv icon

Streaming Long Video Understanding with Large Language Models

Add code
May 25, 2024
Figure 1 for Streaming Long Video Understanding with Large Language Models
Figure 2 for Streaming Long Video Understanding with Large Language Models
Figure 3 for Streaming Long Video Understanding with Large Language Models
Figure 4 for Streaming Long Video Understanding with Large Language Models
Viaarxiv icon

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

Add code
Feb 27, 2024
Viaarxiv icon

Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation

Add code
Nov 29, 2023
Viaarxiv icon

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Add code
Sep 29, 2023
Viaarxiv icon