Picture for Ranjay Krishna

Ranjay Krishna

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Add code
Jan 15, 2026
Viaarxiv icon

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation

Add code
Dec 18, 2025
Figure 1 for GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
Figure 2 for GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
Figure 3 for GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
Figure 4 for GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
Viaarxiv icon

Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

Add code
Dec 12, 2025
Figure 1 for Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
Figure 2 for Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
Figure 3 for Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
Figure 4 for Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
Viaarxiv icon

Agile Deliberation: Concept Deliberation for Subjective Visual Classification

Add code
Dec 11, 2025
Figure 1 for Agile Deliberation: Concept Deliberation for Subjective Visual Classification
Figure 2 for Agile Deliberation: Concept Deliberation for Subjective Visual Classification
Figure 3 for Agile Deliberation: Concept Deliberation for Subjective Visual Classification
Figure 4 for Agile Deliberation: Concept Deliberation for Subjective Visual Classification
Viaarxiv icon

Mull-Tokens: Modality-Agnostic Latent Thinking

Add code
Dec 11, 2025
Figure 1 for Mull-Tokens: Modality-Agnostic Latent Thinking
Figure 2 for Mull-Tokens: Modality-Agnostic Latent Thinking
Figure 3 for Mull-Tokens: Modality-Agnostic Latent Thinking
Figure 4 for Mull-Tokens: Modality-Agnostic Latent Thinking
Viaarxiv icon

OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis

Add code
Dec 11, 2025
Figure 1 for OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
Figure 2 for OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
Figure 3 for OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
Figure 4 for OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
Viaarxiv icon

OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

Add code
Nov 17, 2025
Viaarxiv icon

SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding

Add code
Nov 06, 2025
Viaarxiv icon

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Add code
Oct 30, 2025
Figure 1 for ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Figure 2 for ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Figure 3 for ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Figure 4 for ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Viaarxiv icon

Visual Representations inside the Language Model

Add code
Oct 06, 2025
Figure 1 for Visual Representations inside the Language Model
Figure 2 for Visual Representations inside the Language Model
Figure 3 for Visual Representations inside the Language Model
Figure 4 for Visual Representations inside the Language Model
Viaarxiv icon