Picture for Arsha Nagrani

Arsha Nagrani

The VoxCeleb Speaker Recognition Challenge: A Retrospective

Add code
Aug 27, 2024
Viaarxiv icon

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Add code
Jul 29, 2024
Figure 1 for Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Figure 2 for Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Figure 3 for Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Figure 4 for Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Viaarxiv icon

AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description

Add code
Jul 22, 2024
Viaarxiv icon

AutoAD III: The Prequel -- Back to the Pixels

Add code
Apr 22, 2024
Viaarxiv icon

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Add code
Apr 09, 2024
Viaarxiv icon

Streaming Dense Video Captioning

Add code
Apr 01, 2024
Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Add code
Dec 01, 2023
Viaarxiv icon

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Add code
Oct 10, 2023
Viaarxiv icon

VidChapters-7M: Video Chapters at Scale

Add code
Sep 25, 2023
Viaarxiv icon

LanSER: Language-Model Supported Speech Emotion Recognition

Add code
Sep 07, 2023
Viaarxiv icon