Picture for Leonard Salewski

Leonard Salewski

Zero-shot audio captioning with audio-language model guidance and audio context keywords

Add code
Nov 14, 2023
Viaarxiv icon

Zero-shot Translation of Attention Patterns in VQA Models to Natural Language

Add code
Nov 08, 2023
Viaarxiv icon

In-Context Impersonation Reveals Large Language Models' Strengths and Biases

Add code
May 24, 2023
Figure 1 for In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Figure 2 for In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Figure 3 for In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Figure 4 for In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Viaarxiv icon

Diverse Video Captioning by Adaptive Spatio-temporal Attention

Add code
Aug 19, 2022
Figure 1 for Diverse Video Captioning by Adaptive Spatio-temporal Attention
Figure 2 for Diverse Video Captioning by Adaptive Spatio-temporal Attention
Figure 3 for Diverse Video Captioning by Adaptive Spatio-temporal Attention
Figure 4 for Diverse Video Captioning by Adaptive Spatio-temporal Attention
Viaarxiv icon

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

Add code
Apr 05, 2022
Figure 1 for CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Figure 2 for CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Figure 3 for CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Figure 4 for CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Viaarxiv icon

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

Add code
May 08, 2021
Figure 1 for e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
Figure 2 for e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
Figure 3 for e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
Figure 4 for e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
Viaarxiv icon

Relational Generalized Few-Shot Learning

Add code
Jul 22, 2019
Figure 1 for Relational Generalized Few-Shot Learning
Figure 2 for Relational Generalized Few-Shot Learning
Figure 3 for Relational Generalized Few-Shot Learning
Figure 4 for Relational Generalized Few-Shot Learning
Viaarxiv icon