Picture for Wenhao Chai

Wenhao Chai

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Add code
Jan 17, 2026
Viaarxiv icon

Reasoning Matters for 3D Visual Grounding

Add code
Jan 13, 2026
Viaarxiv icon

BabyVision: Visual Reasoning Beyond Language

Add code
Jan 10, 2026
Viaarxiv icon

Next-Embedding Prediction Makes Strong Vision Learners

Add code
Dec 23, 2025
Figure 1 for Next-Embedding Prediction Makes Strong Vision Learners
Figure 2 for Next-Embedding Prediction Makes Strong Vision Learners
Figure 3 for Next-Embedding Prediction Makes Strong Vision Learners
Figure 4 for Next-Embedding Prediction Makes Strong Vision Learners
Viaarxiv icon

FrontierCS: Evolving Challenges for Evolving Intelligence

Add code
Dec 17, 2025
Figure 1 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 2 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 3 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 4 for FrontierCS: Evolving Challenges for Evolving Intelligence
Viaarxiv icon

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Add code
Dec 09, 2025
Viaarxiv icon

UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning

Add code
Oct 21, 2025
Viaarxiv icon

VideoNSA: Native Sparse Attention Scales Video Understanding

Add code
Oct 02, 2025
Figure 1 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 2 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 3 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 4 for VideoNSA: Native Sparse Attention Scales Video Understanding
Viaarxiv icon

Dense Video Understanding with Gated Residual Tokenization

Add code
Sep 18, 2025
Viaarxiv icon

AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding

Add code
Jul 03, 2025
Viaarxiv icon