Picture for Ge Zhang

Ge Zhang

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Add code
Feb 26, 2026
Viaarxiv icon

WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints

Add code
Feb 09, 2026
Viaarxiv icon

The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL

Add code
Feb 06, 2026
Viaarxiv icon

BABE: Biology Arena BEnchmark

Add code
Feb 05, 2026
Viaarxiv icon

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Add code
Feb 05, 2026
Viaarxiv icon

Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

Add code
Jan 29, 2026
Viaarxiv icon

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Add code
Jan 29, 2026
Viaarxiv icon

TabularMath: Evaluating Computational Extrapolation in Tabular Learning via Program-Verified Synthesis

Add code
Jan 25, 2026
Viaarxiv icon

FutureX-Pro: Extending Future Prediction to High-Value Vertical Domains

Add code
Jan 18, 2026
Viaarxiv icon

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Add code
Jan 13, 2026
Viaarxiv icon