Picture for Ranjay Krishna

Ranjay Krishna

Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming

Add code
Dec 11, 2024
Viaarxiv icon

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

Add code
Dec 10, 2024
Viaarxiv icon

SAT: Spatial Aptitude Training for Multimodal Language Models

Add code
Dec 10, 2024
Viaarxiv icon

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

Add code
Dec 09, 2024
Viaarxiv icon

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Add code
Dec 04, 2024
Viaarxiv icon

Negative Token Merging: Image-based Adversarial Feature Guidance

Add code
Dec 02, 2024
Viaarxiv icon

Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment

Add code
Nov 26, 2024
Viaarxiv icon

One Diffusion to Generate Them All

Add code
Nov 25, 2024
Viaarxiv icon

I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences

Add code
Nov 20, 2024
Viaarxiv icon