Picture for Dave Zhenyu Chen

Dave Zhenyu Chen

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

Add code
May 16, 2024
Figure 1 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Figure 2 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Figure 3 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Figure 4 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Viaarxiv icon

EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

Add code
May 02, 2024
Viaarxiv icon

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

Add code
Nov 28, 2023
Viaarxiv icon

Generating Context-Aware Natural Answers for Questions in 3D Scenes

Add code
Oct 30, 2023
Viaarxiv icon

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

Add code
Mar 20, 2023
Viaarxiv icon

UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding

Add code
Dec 01, 2022
Viaarxiv icon

Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments

Add code
Aug 31, 2022
Figure 1 for Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments
Figure 2 for Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments
Figure 3 for Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments
Figure 4 for Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments
Viaarxiv icon

D3Net: A Speaker-Listener Architecture for Semi-supervised Dense Captioning and Visual Grounding in RGB-D Scans

Add code
Dec 02, 2021
Figure 1 for D3Net: A Speaker-Listener Architecture for Semi-supervised Dense Captioning and Visual Grounding in RGB-D Scans
Figure 2 for D3Net: A Speaker-Listener Architecture for Semi-supervised Dense Captioning and Visual Grounding in RGB-D Scans
Figure 3 for D3Net: A Speaker-Listener Architecture for Semi-supervised Dense Captioning and Visual Grounding in RGB-D Scans
Figure 4 for D3Net: A Speaker-Listener Architecture for Semi-supervised Dense Captioning and Visual Grounding in RGB-D Scans
Viaarxiv icon

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

Add code
Dec 03, 2020
Figure 1 for Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Figure 2 for Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Figure 3 for Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Figure 4 for Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Viaarxiv icon

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

Add code
Dec 18, 2019
Figure 1 for ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Figure 2 for ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Figure 3 for ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Figure 4 for ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Viaarxiv icon