Picture for Yan Yan

Yan Yan

Distill Video Datasets into Images

Add code
Dec 16, 2025
Viaarxiv icon

Consistent Instance Field for Dynamic Scene Understanding

Add code
Dec 16, 2025
Viaarxiv icon

From Particles to Fields: Reframing Photon Mapping with Continuous Gaussian Photon Fields

Add code
Dec 13, 2025
Viaarxiv icon

VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction

Add code
Dec 11, 2025
Viaarxiv icon

TraceFlow: Dynamic 3D Reconstruction of Specular Scenes Driven by Ray Tracing

Add code
Dec 10, 2025
Viaarxiv icon

GLaD: Geometric Latent Distillation for Vision-Language-Action Models

Add code
Dec 10, 2025
Viaarxiv icon

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

Add code
Nov 18, 2025
Figure 1 for MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
Figure 2 for MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
Figure 3 for MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
Figure 4 for MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
Viaarxiv icon

WarpGAN: Warping-Guided 3D GAN Inversion with Style-Based Novel View Inpainting

Add code
Nov 11, 2025
Viaarxiv icon

Efficient Multimodal Dataset Distillation via Generative Models

Add code
Sep 18, 2025
Viaarxiv icon

Investigating the Design Space of Visual Grounding in Multimodal Large Language Model

Add code
Aug 11, 2025
Figure 1 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Figure 2 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Figure 3 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Figure 4 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Viaarxiv icon