Picture for Rogerio Feris

Rogerio Feris

Latent Implicit Visual Reasoning

Add code
Dec 24, 2025
Viaarxiv icon

DAVE: A VLM Vision Encoder for Document Understanding and Web Agents

Add code
Dec 19, 2025
Viaarxiv icon

Activation Reward Models for Few-Shot Model Alignment

Add code
Jul 02, 2025
Viaarxiv icon

Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion

Add code
May 23, 2025
Viaarxiv icon

Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?

Add code
May 14, 2025
Figure 1 for Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Figure 2 for Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Viaarxiv icon

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

Add code
May 02, 2025
Viaarxiv icon

Visualizing Thought: Conceptual Diagrams Enable Robust Planning in LMMs

Add code
Mar 14, 2025
Viaarxiv icon

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

Add code
Feb 03, 2025
Viaarxiv icon

Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation

Add code
Dec 03, 2024
Figure 1 for Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation
Figure 2 for Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation
Figure 3 for Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation
Figure 4 for Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation
Viaarxiv icon

Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers

Add code
Nov 28, 2024
Figure 1 for Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Figure 2 for Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Figure 3 for Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Figure 4 for Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Viaarxiv icon