Picture for Marco Pedersoli

Marco Pedersoli

ETS

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Add code
Feb 03, 2025
Figure 1 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 2 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 3 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 4 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Viaarxiv icon

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Add code
Dec 05, 2024
Figure 1 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 2 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 3 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 4 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Viaarxiv icon

Visual Modality Prompt for Adapting Vision-Language Object Detectors

Add code
Dec 01, 2024
Figure 1 for Visual Modality Prompt for Adapting Vision-Language Object Detectors
Figure 2 for Visual Modality Prompt for Adapting Vision-Language Object Detectors
Figure 3 for Visual Modality Prompt for Adapting Vision-Language Object Detectors
Figure 4 for Visual Modality Prompt for Adapting Vision-Language Object Detectors
Viaarxiv icon

Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation

Add code
Nov 26, 2024
Viaarxiv icon

IntentGPT: Few-shot Intent Discovery with Large Language Models

Add code
Nov 16, 2024
Viaarxiv icon

Unsupervised Object Discovery: A Comprehensive Survey and Unified Taxonomy

Add code
Oct 30, 2024
Viaarxiv icon

Source-Free Domain Adaptation for YOLO Object Detection

Add code
Sep 25, 2024
Figure 1 for Source-Free Domain Adaptation for YOLO Object Detection
Figure 2 for Source-Free Domain Adaptation for YOLO Object Detection
Figure 3 for Source-Free Domain Adaptation for YOLO Object Detection
Figure 4 for Source-Free Domain Adaptation for YOLO Object Detection
Viaarxiv icon

Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition

Add code
Aug 16, 2024
Viaarxiv icon

Text- and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild

Add code
Jul 17, 2024
Viaarxiv icon

Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos

Add code
Jul 08, 2024
Figure 1 for Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos
Figure 2 for Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos
Figure 3 for Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos
Figure 4 for Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos
Viaarxiv icon