Picture for Joya Chen

Joya Chen

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Add code
Sep 29, 2024
Viaarxiv icon

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

Add code
Aug 29, 2024
Figure 1 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 2 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 3 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 4 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Viaarxiv icon

Learning Video Context as Interleaved Multimodal Sequences

Add code
Jul 31, 2024
Viaarxiv icon

VideoLLM-online: Online Video Large Language Model for Streaming Video

Add code
Jun 17, 2024
Viaarxiv icon

From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

Add code
Jun 12, 2024
Figure 1 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Figure 2 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Figure 3 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Figure 4 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Viaarxiv icon

Bootstrapping SparseFormers from Vision Foundation Models

Add code
Dec 04, 2023
Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Nov 30, 2023
Viaarxiv icon

UniVTG: Towards Unified Video-Language Temporal Grounding

Add code
Aug 18, 2023
Viaarxiv icon

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

Add code
Jun 28, 2023
Viaarxiv icon

Affordance Grounding from Demonstration Video to Target Image

Add code
Mar 26, 2023
Viaarxiv icon