Picture for Zhen Yang

Zhen Yang

School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 2100023, China

Can LLMs See Without Pixels? Benchmarking Spatial Intelligence from Textual Descriptions

Add code
Jan 07, 2026
Viaarxiv icon

D$^3$R-DETR: DETR with Dual-Domain Density Refinement for Tiny Object Detection in Aerial Images

Add code
Jan 06, 2026
Viaarxiv icon

SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?

Add code
Dec 26, 2025
Viaarxiv icon

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Add code
Dec 18, 2025
Figure 1 for StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
Figure 2 for StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
Figure 3 for StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
Figure 4 for StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
Viaarxiv icon

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Add code
Dec 11, 2025
Viaarxiv icon

RoboTidy : A 3D Gaussian Splatting Household Tidying Benchmark for Embodied Navigation and Action

Add code
Nov 19, 2025
Viaarxiv icon

Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical Regression

Add code
Nov 18, 2025
Figure 1 for Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical Regression
Figure 2 for Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical Regression
Figure 3 for Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical Regression
Figure 4 for Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical Regression
Viaarxiv icon

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

Add code
Nov 14, 2025
Figure 1 for UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation
Figure 2 for UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation
Figure 3 for UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation
Figure 4 for UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation
Viaarxiv icon

MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Add code
Nov 10, 2025
Viaarxiv icon

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

Add code
Nov 09, 2025
Viaarxiv icon