Picture for Rogerio Feris

Rogerio Feris

Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation

Add code
Dec 03, 2024
Viaarxiv icon

Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers

Add code
Nov 28, 2024
Viaarxiv icon

State-Space Large Audio Language Models

Add code
Nov 24, 2024
Viaarxiv icon

Teaching VLMs to Localize Specific Objects from In-context Examples

Add code
Nov 20, 2024
Viaarxiv icon

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

Add code
Oct 08, 2024
Viaarxiv icon

Scaling Granite Code Models to 128K Context

Add code
Jul 18, 2024
Viaarxiv icon

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

Add code
Jul 04, 2024
Viaarxiv icon

Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems

Add code
Jun 18, 2024
Viaarxiv icon

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Add code
Jun 17, 2024
Viaarxiv icon

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Add code
Jun 14, 2024
Figure 1 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 2 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 3 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 4 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Viaarxiv icon