Picture for Shoubin Yu

Shoubin Yu

VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation

Add code
Mar 19, 2025
Viaarxiv icon

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Add code
Dec 11, 2024
Viaarxiv icon

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level

Add code
Nov 15, 2024
Figure 1 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Figure 2 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Figure 3 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Figure 4 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Viaarxiv icon

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation

Add code
Oct 16, 2024
Figure 1 for SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Figure 2 for SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Figure 3 for SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Figure 4 for SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Viaarxiv icon

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Add code
May 29, 2024
Figure 1 for VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Figure 2 for VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Figure 3 for VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Figure 4 for VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Viaarxiv icon

RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives

Add code
May 28, 2024
Viaarxiv icon

STAR: A Benchmark for Situated Reasoning in Real-World Videos

Add code
May 15, 2024
Viaarxiv icon

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion

Add code
Feb 08, 2024
Viaarxiv icon

A Simple LLM Framework for Long-Range Video Question-Answering

Add code
Dec 28, 2023
Viaarxiv icon

Self-Chained Image-Language Model for Video Localization and Question Answering

Add code
May 11, 2023
Viaarxiv icon