Picture for Ping Nie

Ping Nie

GraphDancer: Training LLMs to Explore and Reason over Graphs via Curriculum Reinforcement Learning

Add code
Jan 24, 2026
Viaarxiv icon

Beyond Single-shot Writing: Deep Research Agents are Unreliable at Multi-turn Report Revision

Add code
Jan 19, 2026
Viaarxiv icon

A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports

Add code
Oct 02, 2025
Viaarxiv icon

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Add code
Jun 04, 2025
Viaarxiv icon

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Add code
May 26, 2025
Figure 1 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Figure 2 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Figure 3 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Figure 4 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Viaarxiv icon

Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales

Add code
May 25, 2025
Viaarxiv icon

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Add code
May 20, 2025
Viaarxiv icon

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Add code
May 16, 2025
Viaarxiv icon

Nearest Neighbor Multivariate Time Series Forecasting

Add code
May 16, 2025
Viaarxiv icon

Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining

Add code
May 16, 2025
Viaarxiv icon