Picture for Nina Shvetsova

Nina Shvetsova

HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

Add code
Oct 07, 2023
Viaarxiv icon

In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval

Add code
Sep 16, 2023
Viaarxiv icon

Preserving Modality Structure Improves Multi-Modal Learning

Add code
Aug 24, 2023
Viaarxiv icon

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Add code
Mar 29, 2023
Viaarxiv icon

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

Add code
Mar 15, 2023
Viaarxiv icon

Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

Add code
Jan 05, 2023
Figure 1 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Figure 2 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Figure 3 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Figure 4 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Viaarxiv icon

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Add code
Oct 07, 2022
Figure 1 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 2 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 3 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 4 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Viaarxiv icon

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

Add code
Sep 12, 2022
Figure 1 for VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Figure 2 for VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Figure 3 for VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Figure 4 for VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Viaarxiv icon

Augmentation Learning for Semi-Supervised Classification

Add code
Aug 03, 2022
Figure 1 for Augmentation Learning for Semi-Supervised Classification
Figure 2 for Augmentation Learning for Semi-Supervised Classification
Figure 3 for Augmentation Learning for Semi-Supervised Classification
Figure 4 for Augmentation Learning for Semi-Supervised Classification
Viaarxiv icon

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

Add code
Dec 08, 2021
Figure 1 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 2 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 3 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 4 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Viaarxiv icon