Picture for Rogerio Feris

Rogerio Feris

Visualizing Thought: Conceptual Diagrams Enable Robust Planning in LMMs

Add code
Mar 14, 2025
Viaarxiv icon

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

Add code
Feb 03, 2025
Viaarxiv icon

Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation

Add code
Dec 03, 2024
Figure 1 for Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation
Figure 2 for Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation
Figure 3 for Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation
Figure 4 for Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation
Viaarxiv icon

Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers

Add code
Nov 28, 2024
Figure 1 for Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Figure 2 for Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Figure 3 for Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Figure 4 for Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Viaarxiv icon

State-Space Large Audio Language Models

Add code
Nov 24, 2024
Viaarxiv icon

Teaching VLMs to Localize Specific Objects from In-context Examples

Add code
Nov 20, 2024
Figure 1 for Teaching VLMs to Localize Specific Objects from In-context Examples
Figure 2 for Teaching VLMs to Localize Specific Objects from In-context Examples
Figure 3 for Teaching VLMs to Localize Specific Objects from In-context Examples
Figure 4 for Teaching VLMs to Localize Specific Objects from In-context Examples
Viaarxiv icon

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

Add code
Oct 08, 2024
Figure 1 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Figure 2 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Figure 3 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Figure 4 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Viaarxiv icon

Scaling Granite Code Models to 128K Context

Add code
Jul 18, 2024
Viaarxiv icon

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

Add code
Jul 04, 2024
Viaarxiv icon

Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems

Add code
Jun 18, 2024
Viaarxiv icon