Picture for Cordelia Schmid

Cordelia Schmid

Thoth

Grounded Video Caption Generation

Add code
Nov 12, 2024
Viaarxiv icon

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

Add code
Oct 31, 2024
Viaarxiv icon

Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy

Add code
Oct 02, 2024
Viaarxiv icon

Towards Zero-Shot Multimodal Machine Translation

Add code
Jul 18, 2024
Viaarxiv icon

DataDream: Few-shot Guided Dataset Generation

Add code
Jul 16, 2024
Figure 1 for DataDream: Few-shot Guided Dataset Generation
Viaarxiv icon

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Add code
Jun 13, 2024
Figure 1 for mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Figure 2 for mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Figure 3 for mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Figure 4 for mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Viaarxiv icon

Smoke and Mirrors in Causal Downstream Tasks

Add code
May 27, 2024
Figure 1 for Smoke and Mirrors in Causal Downstream Tasks
Figure 2 for Smoke and Mirrors in Causal Downstream Tasks
Figure 3 for Smoke and Mirrors in Causal Downstream Tasks
Figure 4 for Smoke and Mirrors in Causal Downstream Tasks
Viaarxiv icon

Learning text-to-video retrieval from image captioning

Add code
Apr 26, 2024
Viaarxiv icon

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

Add code
Apr 24, 2024
Viaarxiv icon

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Add code
Apr 09, 2024
Viaarxiv icon