Picture for Wenhao Chai

Wenhao Chai

VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

Add code
Apr 10, 2026
Viaarxiv icon

Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling

Add code
Feb 09, 2026
Viaarxiv icon

UEval: A Benchmark for Unified Multimodal Generation

Add code
Jan 29, 2026
Viaarxiv icon

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Add code
Jan 17, 2026
Viaarxiv icon

Reasoning Matters for 3D Visual Grounding

Add code
Jan 13, 2026
Viaarxiv icon

BabyVision: Visual Reasoning Beyond Language

Add code
Jan 10, 2026
Viaarxiv icon

Next-Embedding Prediction Makes Strong Vision Learners

Add code
Dec 23, 2025
Figure 1 for Next-Embedding Prediction Makes Strong Vision Learners
Figure 2 for Next-Embedding Prediction Makes Strong Vision Learners
Figure 3 for Next-Embedding Prediction Makes Strong Vision Learners
Figure 4 for Next-Embedding Prediction Makes Strong Vision Learners
Viaarxiv icon

FrontierCS: Evolving Challenges for Evolving Intelligence

Add code
Dec 17, 2025
Figure 1 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 2 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 3 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 4 for FrontierCS: Evolving Challenges for Evolving Intelligence
Viaarxiv icon

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Add code
Dec 09, 2025
Viaarxiv icon

UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning

Add code
Oct 21, 2025
Viaarxiv icon