Picture for Philipp Krähenbühl

Philipp Krähenbühl

QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation

Add code
Feb 07, 2025
Viaarxiv icon

Robust Autonomy Emerges from Self-Play

Add code
Feb 05, 2025
Viaarxiv icon

Reinforcement Learning for Long-Horizon Interactive LLM Agents

Add code
Feb 04, 2025
Viaarxiv icon

Cut Your Losses in Large-Vocabulary Language Models

Add code
Nov 13, 2024
Viaarxiv icon

Promptable Closed-loop Traffic Simulation

Add code
Sep 09, 2024
Figure 1 for Promptable Closed-loop Traffic Simulation
Figure 2 for Promptable Closed-loop Traffic Simulation
Figure 3 for Promptable Closed-loop Traffic Simulation
Viaarxiv icon

Image and Video Tokenization with Binary Spherical Quantization

Add code
Jun 11, 2024
Viaarxiv icon

Language-Image Models with 3D Understanding

Add code
May 06, 2024
Viaarxiv icon

Distilling Vision-Language Models on Millions of Videos

Add code
Jan 11, 2024
Figure 1 for Distilling Vision-Language Models on Millions of Videos
Figure 2 for Distilling Vision-Language Models on Millions of Videos
Figure 3 for Distilling Vision-Language Models on Millions of Videos
Figure 4 for Distilling Vision-Language Models on Millions of Videos
Viaarxiv icon

Language-conditioned Detection Transformer

Add code
Nov 29, 2023
Viaarxiv icon

Training a Large Video Model on a Single Machine in a Day

Add code
Sep 28, 2023
Figure 1 for Training a Large Video Model on a Single Machine in a Day
Figure 2 for Training a Large Video Model on a Single Machine in a Day
Figure 3 for Training a Large Video Model on a Single Machine in a Day
Figure 4 for Training a Large Video Model on a Single Machine in a Day
Viaarxiv icon