Picture for Yang Jin

Yang Jin

Pyramidal Flow Matching for Efficient Video Generative Modeling

Add code
Oct 08, 2024
Viaarxiv icon

Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model

Add code
Aug 02, 2024
Viaarxiv icon

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

Add code
May 23, 2024
Viaarxiv icon

DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model

Add code
May 12, 2024
Viaarxiv icon

Harder Tasks Need More Experts: Dynamic Routing in MoE Models

Add code
Mar 12, 2024
Viaarxiv icon

TransGOP: Transformer-Based Gaze Object Prediction

Add code
Feb 21, 2024
Viaarxiv icon

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Add code
Feb 06, 2024
Figure 1 for Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Figure 2 for Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Figure 3 for Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Figure 4 for Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Viaarxiv icon

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

Add code
Sep 29, 2023
Viaarxiv icon

Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce

Add code
Apr 06, 2023
Viaarxiv icon

Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding

Add code
Sep 27, 2022
Figure 1 for Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
Figure 2 for Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
Figure 3 for Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
Figure 4 for Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
Viaarxiv icon