Picture for Ruimao Zhang

Ruimao Zhang

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

Add code
Nov 04, 2024
Viaarxiv icon

WorldSimBench: Towards Video Generation Models as World Simulators

Add code
Oct 23, 2024
Figure 1 for WorldSimBench: Towards Video Generation Models as World Simulators
Figure 2 for WorldSimBench: Towards Video Generation Models as World Simulators
Figure 3 for WorldSimBench: Towards Video Generation Models as World Simulators
Figure 4 for WorldSimBench: Towards Video Generation Models as World Simulators
Viaarxiv icon

Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity

Add code
Oct 01, 2024
Viaarxiv icon

Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models

Add code
Aug 21, 2024
Viaarxiv icon

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Add code
Jul 17, 2024
Viaarxiv icon

Open-World Human-Object Interaction Detection via Multi-modal Prompts

Add code
Jun 11, 2024
Viaarxiv icon

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Add code
May 30, 2024
Viaarxiv icon

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

Add code
Apr 25, 2024
Viaarxiv icon

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

Add code
Mar 19, 2024
Viaarxiv icon

Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration

Add code
Feb 07, 2024
Viaarxiv icon