Picture for Limin Wang

Limin Wang

FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

Add code
Oct 30, 2024
Viaarxiv icon

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Add code
Oct 25, 2024
Viaarxiv icon

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Add code
Oct 10, 2024
Viaarxiv icon

Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks

Add code
Sep 27, 2024
Viaarxiv icon

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Add code
Sep 06, 2024
Viaarxiv icon

Stochastic Layer-Wise Shuffle: A Good Practice to Improve Vision Mamba Training

Add code
Aug 30, 2024
Viaarxiv icon

Dynamic and Compressive Adaptation of Transformers From Images to Videos

Add code
Aug 14, 2024
Viaarxiv icon

Efficient Test-Time Prompt Tuning for Vision-Language Models

Add code
Aug 11, 2024
Figure 1 for Efficient Test-Time Prompt Tuning for Vision-Language Models
Figure 2 for Efficient Test-Time Prompt Tuning for Vision-Language Models
Figure 3 for Efficient Test-Time Prompt Tuning for Vision-Language Models
Figure 4 for Efficient Test-Time Prompt Tuning for Vision-Language Models
Viaarxiv icon

CycleHOI: Improving Human-Object Interaction Detection with Cycle Consistency of Detection and Generation

Add code
Jul 16, 2024
Viaarxiv icon

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model

Add code
Jul 09, 2024
Viaarxiv icon