Picture for Federico Cocchi

Federico Cocchi

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Add code
Nov 25, 2024
Viaarxiv icon

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Add code
Jul 29, 2024
Viaarxiv icon

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

Add code
Apr 23, 2024
Figure 1 for Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Figure 2 for Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Figure 3 for Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Figure 4 for Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Viaarxiv icon

The (R)Evolution of Multimodal Large Language Models: A Survey

Add code
Feb 19, 2024
Viaarxiv icon

Removing NSFW Concepts from Vision-and-Language Models for Text-to-Image Retrieval and Generation

Add code
Nov 27, 2023
Viaarxiv icon