Picture for Matteo Stefanini

Matteo Stefanini

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

Add code
Jul 29, 2022
Figure 1 for ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
Figure 2 for ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
Figure 3 for ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
Figure 4 for ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
Viaarxiv icon

CaMEL: Mean Teacher Learning for Image Captioning

Add code
Feb 21, 2022
Figure 1 for CaMEL: Mean Teacher Learning for Image Captioning
Figure 2 for CaMEL: Mean Teacher Learning for Image Captioning
Figure 3 for CaMEL: Mean Teacher Learning for Image Captioning
Figure 4 for CaMEL: Mean Teacher Learning for Image Captioning
Viaarxiv icon

From Show to Tell: A Survey on Image Captioning

Add code
Jul 30, 2021
Figure 1 for From Show to Tell: A Survey on Image Captioning
Figure 2 for From Show to Tell: A Survey on Image Captioning
Figure 3 for From Show to Tell: A Survey on Image Captioning
Figure 4 for From Show to Tell: A Survey on Image Captioning
Viaarxiv icon

Learning to Select: A Fully Attentive Approach for Novel Object Captioning

Add code
Jun 02, 2021
Figure 1 for Learning to Select: A Fully Attentive Approach for Novel Object Captioning
Figure 2 for Learning to Select: A Fully Attentive Approach for Novel Object Captioning
Figure 3 for Learning to Select: A Fully Attentive Approach for Novel Object Captioning
Figure 4 for Learning to Select: A Fully Attentive Approach for Novel Object Captioning
Viaarxiv icon

A Novel Attention-based Aggregation Function to Combine Vision and Language

Add code
Apr 27, 2020
Figure 1 for A Novel Attention-based Aggregation Function to Combine Vision and Language
Figure 2 for A Novel Attention-based Aggregation Function to Combine Vision and Language
Figure 3 for A Novel Attention-based Aggregation Function to Combine Vision and Language
Figure 4 for A Novel Attention-based Aggregation Function to Combine Vision and Language
Viaarxiv icon

M$^2$: Meshed-Memory Transformer for Image Captioning

Add code
Dec 17, 2019
Figure 1 for M$^2$: Meshed-Memory Transformer for Image Captioning
Figure 2 for M$^2$: Meshed-Memory Transformer for Image Captioning
Figure 3 for M$^2$: Meshed-Memory Transformer for Image Captioning
Figure 4 for M$^2$: Meshed-Memory Transformer for Image Captioning
Viaarxiv icon

A Deep Learning based approach to VM behavior identification in cloud systems

Add code
Mar 05, 2019
Figure 1 for A Deep Learning based approach to VM behavior identification in cloud systems
Figure 2 for A Deep Learning based approach to VM behavior identification in cloud systems
Figure 3 for A Deep Learning based approach to VM behavior identification in cloud systems
Figure 4 for A Deep Learning based approach to VM behavior identification in cloud systems
Viaarxiv icon