Picture for Ranjay Krishna

Ranjay Krishna

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology

Add code
Feb 13, 2025
Viaarxiv icon

REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations

Add code
Feb 05, 2025
Figure 1 for REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Figure 2 for REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Figure 3 for REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Figure 4 for REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Viaarxiv icon

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Add code
Jan 30, 2025
Figure 1 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Figure 2 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Figure 3 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Figure 4 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Viaarxiv icon

MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives

Add code
Jan 07, 2025
Viaarxiv icon

The One RING: a Robotic Indoor Navigation Generalist

Add code
Dec 18, 2024
Viaarxiv icon

Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming

Add code
Dec 11, 2024
Figure 1 for Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming
Figure 2 for Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming
Figure 3 for Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming
Figure 4 for Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming
Viaarxiv icon

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

Add code
Dec 10, 2024
Figure 1 for TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action
Figure 2 for TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action
Figure 3 for TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action
Figure 4 for TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action
Viaarxiv icon

SAT: Spatial Aptitude Training for Multimodal Language Models

Add code
Dec 10, 2024
Viaarxiv icon

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

Add code
Dec 09, 2024
Viaarxiv icon

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon