Picture for Yiran Zhong

Yiran Zhong

MiniMax-01: Scaling Foundation Models with Lightning Attention

Add code
Jan 14, 2025
Viaarxiv icon

Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control

Add code
Dec 29, 2024
Viaarxiv icon

A Generative Victim Model for Segmentation

Add code
Dec 10, 2024
Viaarxiv icon

Towards Open-Vocabulary Audio-Visual Event Localization

Add code
Nov 18, 2024
Viaarxiv icon

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Add code
Nov 16, 2024
Figure 1 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Figure 2 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Figure 3 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Figure 4 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Viaarxiv icon

Storyboard guided Alignment for Fine-grained Video Action Recognition

Add code
Oct 18, 2024
Viaarxiv icon

Label-anticipated Event Disentanglement for Audio-Visual Video Parsing

Add code
Jul 11, 2024
Viaarxiv icon

Scaling Laws for Linear Complexity Language Models

Add code
Jun 24, 2024
Viaarxiv icon

Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling

Add code
Jun 03, 2024
Viaarxiv icon

You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

Add code
May 31, 2024
Figure 1 for You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Figure 2 for You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Figure 3 for You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Figure 4 for You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Viaarxiv icon