Picture for Weiqiang Wang

Weiqiang Wang

Ant Group, Shanghai, China

GAPD: Gold-Action Policy Distillation for Agentic Reinforcement Learning in Knowledge Base Question Answering

Add code
May 28, 2026
Viaarxiv icon

EgoBench: An Interactive Egocentric Multimodal Benchmark for Tool-Using Agents

Add code
May 27, 2026
Viaarxiv icon

Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection

Add code
May 27, 2026
Viaarxiv icon

ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization

Add code
May 14, 2026
Viaarxiv icon

Causal Probing for Internal Visual Representations in Multimodal Large Language Models

Add code
May 07, 2026
Viaarxiv icon

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

Add code
May 06, 2026
Viaarxiv icon

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Add code
Apr 21, 2026
Viaarxiv icon

DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off

Add code
Apr 15, 2026
Viaarxiv icon

AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan

Add code
Apr 09, 2026
Viaarxiv icon

Can LLMs Learn to Reason Robustly under Noisy Supervision?

Add code
Apr 05, 2026
Viaarxiv icon