Picture for Cordelia Schmid

Cordelia Schmid

Thoth

Neptune: The Long Orbit to Benchmarking Long Video Understanding

Add code
Dec 12, 2024
Viaarxiv icon

Visual Lexicon: Rich Image Features in Language Space

Add code
Dec 09, 2024
Viaarxiv icon

Language-Guided Image Tokenization for Generation

Add code
Dec 08, 2024
Viaarxiv icon

Grounded Video Caption Generation

Add code
Nov 12, 2024
Viaarxiv icon

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

Add code
Oct 31, 2024
Viaarxiv icon

Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy

Add code
Oct 02, 2024
Viaarxiv icon

Towards Zero-Shot Multimodal Machine Translation

Add code
Jul 18, 2024
Viaarxiv icon

DataDream: Few-shot Guided Dataset Generation

Add code
Jul 16, 2024
Figure 1 for DataDream: Few-shot Guided Dataset Generation
Viaarxiv icon

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Add code
Jun 13, 2024
Figure 1 for mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Figure 2 for mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Figure 3 for mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Figure 4 for mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Viaarxiv icon

Smoke and Mirrors in Causal Downstream Tasks

Add code
May 27, 2024
Figure 1 for Smoke and Mirrors in Causal Downstream Tasks
Figure 2 for Smoke and Mirrors in Causal Downstream Tasks
Figure 3 for Smoke and Mirrors in Causal Downstream Tasks
Figure 4 for Smoke and Mirrors in Causal Downstream Tasks
Viaarxiv icon