Picture for Ying Shan

Ying Shan

ColorFlow: Retrieval-Augmented Image Sequence Colorization

Add code
Dec 16, 2024
Viaarxiv icon

BrushEdit: All-In-One Image Inpainting and Editing

Add code
Dec 13, 2024
Viaarxiv icon

NeRF-Texture: Synthesizing Neural Radiance Field Textures

Add code
Dec 13, 2024
Viaarxiv icon

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Add code
Dec 12, 2024
Viaarxiv icon

MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models

Add code
Dec 09, 2024
Viaarxiv icon

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models

Add code
Dec 05, 2024
Viaarxiv icon

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Add code
Dec 05, 2024
Figure 1 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 2 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 3 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 4 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Viaarxiv icon

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

Add code
Dec 05, 2024
Figure 1 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 2 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 3 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 4 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Viaarxiv icon

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Add code
Dec 05, 2024
Viaarxiv icon

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images

Add code
Dec 04, 2024
Viaarxiv icon