Picture for Alexander Kunitsyn

Alexander Kunitsyn

VLRM: Vision-Language Models act as Reward Models for Image Captioning

Add code
Apr 02, 2024
Viaarxiv icon

MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization

Add code
Mar 14, 2022
Figure 1 for MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Figure 2 for MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Figure 3 for MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Figure 4 for MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Viaarxiv icon