Picture for Sara Sarto

Sara Sarto

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning

Add code
Mar 19, 2025
Viaarxiv icon

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives

Add code
Mar 18, 2025
Viaarxiv icon

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

Add code
Mar 03, 2025
Viaarxiv icon

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training

Add code
Oct 09, 2024
Figure 1 for Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Figure 2 for Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Figure 3 for Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Figure 4 for Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Viaarxiv icon

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

Add code
Jul 29, 2024
Viaarxiv icon

Towards Retrieval-Augmented Architectures for Image Captioning

Add code
May 21, 2024
Figure 1 for Towards Retrieval-Augmented Architectures for Image Captioning
Figure 2 for Towards Retrieval-Augmented Architectures for Image Captioning
Figure 3 for Towards Retrieval-Augmented Architectures for Image Captioning
Figure 4 for Towards Retrieval-Augmented Architectures for Image Captioning
Viaarxiv icon

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

Add code
Apr 23, 2024
Figure 1 for Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Figure 2 for Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Figure 3 for Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Figure 4 for Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Viaarxiv icon

The (R)Evolution of Multimodal Large Language Models: A Survey

Add code
Feb 19, 2024
Viaarxiv icon

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

Add code
Aug 23, 2023
Viaarxiv icon

Multi-Class Explainable Unlearning for Image Classification via Weight Filtering

Add code
Apr 04, 2023
Viaarxiv icon