Picture for Xinlei Chen

Xinlei Chen

Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression

Add code
Feb 06, 2025
Viaarxiv icon

LLMs can see and hear without any training

Add code
Jan 30, 2025
Viaarxiv icon

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Add code
Jan 16, 2025
Figure 1 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Figure 2 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Figure 3 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Figure 4 for Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Viaarxiv icon

Gaussian Masked Autoencoders

Add code
Jan 06, 2025
Figure 1 for Gaussian Masked Autoencoders
Figure 2 for Gaussian Masked Autoencoders
Figure 3 for Gaussian Masked Autoencoders
Figure 4 for Gaussian Masked Autoencoders
Viaarxiv icon

MR-COGraphs: Communication-efficient Multi-Robot Open-vocabulary Mapping System via 3D Scene Graphs

Add code
Dec 24, 2024
Viaarxiv icon

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Add code
Dec 18, 2024
Viaarxiv icon

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

Add code
Nov 14, 2024
Figure 1 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 2 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 3 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 4 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Viaarxiv icon

SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration

Add code
Nov 09, 2024
Viaarxiv icon

Learning Video Representations without Natural Videos

Add code
Oct 31, 2024
Viaarxiv icon

EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment

Add code
Oct 12, 2024
Viaarxiv icon