Picture for Marcella Cornia

Marcella Cornia

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning

Add code
Mar 19, 2025
Viaarxiv icon

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives

Add code
Mar 18, 2025
Viaarxiv icon

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

Add code
Mar 03, 2025
Viaarxiv icon

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis

Add code
Dec 04, 2024
Viaarxiv icon

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

Add code
Nov 28, 2024
Viaarxiv icon

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Add code
Nov 25, 2024
Figure 1 for Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Figure 2 for Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Figure 3 for Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Figure 4 for Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Viaarxiv icon

TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes

Add code
Oct 30, 2024
Figure 1 for TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes
Figure 2 for TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes
Figure 3 for TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes
Figure 4 for TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes
Viaarxiv icon

Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments

Add code
Oct 23, 2024
Viaarxiv icon

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training

Add code
Oct 09, 2024
Figure 1 for Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Figure 2 for Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Figure 3 for Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Figure 4 for Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Viaarxiv icon

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

Add code
Aug 29, 2024
Figure 1 for Fluent and Accurate Image Captioning with a Self-Trained Reward Model
Figure 2 for Fluent and Accurate Image Captioning with a Self-Trained Reward Model
Figure 3 for Fluent and Accurate Image Captioning with a Self-Trained Reward Model
Figure 4 for Fluent and Accurate Image Captioning with a Self-Trained Reward Model
Viaarxiv icon