Picture for Ranjay Krishna

Ranjay Krishna

MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives

Add code
Jan 07, 2025
Viaarxiv icon

The One RING: a Robotic Indoor Navigation Generalist

Add code
Dec 18, 2024
Viaarxiv icon

Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming

Add code
Dec 11, 2024
Figure 1 for Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming
Figure 2 for Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming
Figure 3 for Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming
Figure 4 for Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming
Viaarxiv icon

SAT: Spatial Aptitude Training for Multimodal Language Models

Add code
Dec 10, 2024
Viaarxiv icon

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

Add code
Dec 10, 2024
Viaarxiv icon

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

Add code
Dec 09, 2024
Viaarxiv icon

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Add code
Dec 04, 2024
Viaarxiv icon

Negative Token Merging: Image-based Adversarial Feature Guidance

Add code
Dec 02, 2024
Viaarxiv icon

Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment

Add code
Nov 26, 2024
Viaarxiv icon