Picture for Chuang Gan

Chuang Gan

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

Add code
Dec 02, 2024
Viaarxiv icon

SnapMem: Snapshot-based 3D Scene Memory for Embodied Exploration and Reasoning

Add code
Nov 23, 2024
Viaarxiv icon

UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments

Add code
Nov 19, 2024
Viaarxiv icon

Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting

Add code
Nov 14, 2024
Viaarxiv icon

Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge

Add code
Nov 05, 2024
Viaarxiv icon

DELTA: Dense Efficient Long-range 3D Tracking for any video

Add code
Oct 31, 2024
Viaarxiv icon

SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization

Add code
Oct 28, 2024
Figure 1 for SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Figure 2 for SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Figure 3 for SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Figure 4 for SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Viaarxiv icon

UniMuMo: Unified Text, Music and Motion Generation

Add code
Oct 06, 2024
Figure 1 for UniMuMo: Unified Text, Music and Motion Generation
Figure 2 for UniMuMo: Unified Text, Music and Motion Generation
Figure 3 for UniMuMo: Unified Text, Music and Motion Generation
Figure 4 for UniMuMo: Unified Text, Music and Motion Generation
Viaarxiv icon

Compositional Physical Reasoning of Objects and Events from Videos

Add code
Aug 02, 2024
Viaarxiv icon

FlexAttention for Efficient High-Resolution Vision-Language Models

Add code
Jul 29, 2024
Figure 1 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 2 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 3 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 4 for FlexAttention for Efficient High-Resolution Vision-Language Models
Viaarxiv icon