Picture for Xiaojie Jin

Xiaojie Jin

VCoME: Verbal Video Composition with Multimodal Editing Effects

Add code
Jul 05, 2024
Figure 1 for VCoME: Verbal Video Composition with Multimodal Editing Effects
Figure 2 for VCoME: Verbal Video Composition with Multimodal Editing Effects
Figure 3 for VCoME: Verbal Video Composition with Multimodal Editing Effects
Figure 4 for VCoME: Verbal Video Composition with Multimodal Editing Effects
Viaarxiv icon

Hierarchical Memory for Long Video QA

Add code
Jun 30, 2024
Viaarxiv icon

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

Add code
Jun 12, 2024
Viaarxiv icon

The SkatingVerse Workshop & Challenge: Methods and Results

Add code
May 27, 2024
Viaarxiv icon

Video Recognition in Portrait Mode

Add code
Dec 21, 2023
Viaarxiv icon

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

Add code
Dec 12, 2023
Viaarxiv icon

PixelLM: Pixel Reasoning with Large Multimodal Model

Add code
Dec 04, 2023
Viaarxiv icon

Selective Feature Adapter for Dense Vision Transformers

Add code
Oct 03, 2023
Figure 1 for Selective Feature Adapter for Dense Vision Transformers
Figure 2 for Selective Feature Adapter for Dense Vision Transformers
Figure 3 for Selective Feature Adapter for Dense Vision Transformers
Figure 4 for Selective Feature Adapter for Dense Vision Transformers
Viaarxiv icon

Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling

Add code
Aug 17, 2023
Viaarxiv icon

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Add code
Jun 15, 2023
Viaarxiv icon