Picture for Cordelia Schmid

Cordelia Schmid

Thoth

HORT: Monocular Hand-held Objects Reconstruction with Transformers

Add code
Mar 27, 2025
Viaarxiv icon

Online 3D Scene Reconstruction Using Neural Object Priors

Add code
Mar 24, 2025
Viaarxiv icon

Large-scale Pre-training for Grounded Video Caption Generation

Add code
Mar 13, 2025
Viaarxiv icon

FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement

Add code
Mar 06, 2025
Viaarxiv icon

What Are You Doing? A Closer Look at Controllable Human Video Generation

Add code
Mar 06, 2025
Viaarxiv icon

Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences

Add code
Feb 10, 2025
Viaarxiv icon

Neptune: The Long Orbit to Benchmarking Long Video Understanding

Add code
Dec 12, 2024
Figure 1 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Figure 2 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Figure 3 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Figure 4 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Viaarxiv icon

Visual Lexicon: Rich Image Features in Language Space

Add code
Dec 09, 2024
Figure 1 for Visual Lexicon: Rich Image Features in Language Space
Viaarxiv icon

Language-Guided Image Tokenization for Generation

Add code
Dec 08, 2024
Figure 1 for Language-Guided Image Tokenization for Generation
Figure 2 for Language-Guided Image Tokenization for Generation
Figure 3 for Language-Guided Image Tokenization for Generation
Figure 4 for Language-Guided Image Tokenization for Generation
Viaarxiv icon

Grounded Video Caption Generation

Add code
Nov 12, 2024
Viaarxiv icon