Picture for Linjie Yang

Linjie Yang

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

Add code
Sep 11, 2024
Viaarxiv icon

LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation

Add code
Sep 09, 2024
Viaarxiv icon

Autoregressive Pretraining with Mamba in Vision

Add code
Jun 11, 2024
Figure 1 for Autoregressive Pretraining with Mamba in Vision
Figure 2 for Autoregressive Pretraining with Mamba in Vision
Figure 3 for Autoregressive Pretraining with Mamba in Vision
Figure 4 for Autoregressive Pretraining with Mamba in Vision
Viaarxiv icon

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Add code
Mar 05, 2024
Viaarxiv icon

Video Recognition in Portrait Mode

Add code
Dec 21, 2023
Viaarxiv icon

Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Add code
Dec 19, 2023
Viaarxiv icon

Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling

Add code
Oct 11, 2023
Viaarxiv icon

Selective Feature Adapter for Dense Vision Transformers

Add code
Oct 03, 2023
Figure 1 for Selective Feature Adapter for Dense Vision Transformers
Figure 2 for Selective Feature Adapter for Dense Vision Transformers
Figure 3 for Selective Feature Adapter for Dense Vision Transformers
Figure 4 for Selective Feature Adapter for Dense Vision Transformers
Viaarxiv icon

The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering

Add code
Sep 27, 2023
Viaarxiv icon

Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation

Add code
Jul 27, 2023
Viaarxiv icon