Picture for Xinlei Chen

Xinlei Chen

Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space

Add code
Mar 14, 2025
Viaarxiv icon

Transformers without Normalization

Add code
Mar 13, 2025
Viaarxiv icon

Multi-Robot System for Cooperative Exploration in Unknown Environments: A Survey

Add code
Mar 10, 2025
Viaarxiv icon

UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces

Add code
Mar 08, 2025
Viaarxiv icon

Ultra-High-Frequency Harmony: mmWave Radar and Event Camera Orchestrate Accurate Drone Landing

Add code
Feb 20, 2025
Viaarxiv icon

PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models

Add code
Feb 20, 2025
Viaarxiv icon

Understanding and Evaluating Hallucinations in 3D Visual Language Models

Add code
Feb 18, 2025
Viaarxiv icon

Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression

Add code
Feb 06, 2025
Viaarxiv icon

LLMs can see and hear without any training

Add code
Jan 30, 2025
Viaarxiv icon

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Add code
Jan 16, 2025
Figure 1 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Figure 2 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Figure 3 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Figure 4 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Viaarxiv icon