Picture for Achal Dave

Achal Dave

Should VLMs be Pre-trained with Image Data?

Add code
Mar 10, 2025
Viaarxiv icon

Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model

Add code
Dec 06, 2024
Viaarxiv icon

GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion

Add code
Sep 15, 2024
Figure 1 for GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
Figure 2 for GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
Figure 3 for GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
Figure 4 for GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
Viaarxiv icon

Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

Add code
Jun 24, 2024
Figure 1 for Dreamitate: Real-World Visuomotor Policy Learning via Video Generation
Figure 2 for Dreamitate: Real-World Visuomotor Policy Learning via Video Generation
Figure 3 for Dreamitate: Real-World Visuomotor Policy Learning via Video Generation
Figure 4 for Dreamitate: Real-World Visuomotor Policy Learning via Video Generation
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Figure 1 for DataComp-LM: In search of the next generation of training sets for language models
Figure 2 for DataComp-LM: In search of the next generation of training sets for language models
Figure 3 for DataComp-LM: In search of the next generation of training sets for language models
Figure 4 for DataComp-LM: In search of the next generation of training sets for language models
Viaarxiv icon

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

Add code
May 23, 2024
Figure 1 for Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
Figure 2 for Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
Figure 3 for Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
Figure 4 for Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
Viaarxiv icon

Linearizing Large Language Models

Add code
May 10, 2024
Figure 1 for Linearizing Large Language Models
Figure 2 for Linearizing Large Language Models
Figure 3 for Linearizing Large Language Models
Figure 4 for Linearizing Large Language Models
Viaarxiv icon

Language models scale reliably with over-training and on downstream tasks

Add code
Mar 13, 2024
Figure 1 for Language models scale reliably with over-training and on downstream tasks
Figure 2 for Language models scale reliably with over-training and on downstream tasks
Figure 3 for Language models scale reliably with over-training and on downstream tasks
Figure 4 for Language models scale reliably with over-training and on downstream tasks
Viaarxiv icon

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

Add code
Jan 25, 2024
Figure 1 for pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Figure 2 for pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Figure 3 for pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Figure 4 for pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Viaarxiv icon

Understanding Video Transformers via Universal Concept Discovery

Add code
Jan 19, 2024
Viaarxiv icon