Picture for Kaizhi Zheng

Kaizhi Zheng

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Add code
Oct 03, 2024
Viaarxiv icon

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

Add code
Jun 13, 2024
Viaarxiv icon

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Add code
Jun 12, 2024
Viaarxiv icon

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

Add code
Oct 05, 2023
Viaarxiv icon

R2H: Building Multimodal Navigation Helpers that Respond to Help

Add code
May 23, 2023
Viaarxiv icon

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

Add code
Jan 30, 2023
Figure 1 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Figure 2 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Figure 3 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Figure 4 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Viaarxiv icon

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

Add code
Aug 30, 2022
Figure 1 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Figure 2 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Figure 3 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Figure 4 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Viaarxiv icon

VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation

Add code
Jun 17, 2022
Figure 1 for VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Figure 2 for VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Figure 3 for VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Figure 4 for VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Viaarxiv icon

Manipulation-Oriented Object Perception in Clutter through Affordance Coordinate Frames

Add code
Oct 16, 2020
Figure 1 for Manipulation-Oriented Object Perception in Clutter through Affordance Coordinate Frames
Figure 2 for Manipulation-Oriented Object Perception in Clutter through Affordance Coordinate Frames
Figure 3 for Manipulation-Oriented Object Perception in Clutter through Affordance Coordinate Frames
Figure 4 for Manipulation-Oriented Object Perception in Clutter through Affordance Coordinate Frames
Viaarxiv icon