Picture for Weiqiang Wang

Weiqiang Wang

Ant Group, Shanghai, China

OPRD: On-Policy Representation Distillation

Add code
Jun 04, 2026
Viaarxiv icon

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

Add code
Jun 03, 2026
Viaarxiv icon

Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots

Add code
Jun 03, 2026
Viaarxiv icon

GAPD: Gold-Action Policy Distillation for Agentic Reinforcement Learning in Knowledge Base Question Answering

Add code
May 28, 2026
Viaarxiv icon

Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection

Add code
May 27, 2026
Viaarxiv icon

EgoBench: An Interactive Egocentric Multimodal Benchmark for Tool-Using Agents

Add code
May 27, 2026
Viaarxiv icon

ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization

Add code
May 14, 2026
Viaarxiv icon

Causal Probing for Internal Visual Representations in Multimodal Large Language Models

Add code
May 07, 2026
Viaarxiv icon

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

Add code
May 06, 2026
Viaarxiv icon

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Add code
Apr 21, 2026
Viaarxiv icon