Picture for Yiming Dong

Yiming Dong

Probing RLVR training instability through the lens of objective-level hacking

Add code
Feb 01, 2026
Viaarxiv icon

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

Add code
Jan 12, 2026
Viaarxiv icon

Lightweight posterior construction for gravitational-wave catalogs with the Kolmogorov-Arnold network

Add code
Aug 26, 2025
Viaarxiv icon

P/D-Device: Disaggregated Large Language Model between Cloud and Devices

Add code
Aug 12, 2025
Viaarxiv icon

From Macro to Micro: Probing Dataset Diversity in Language Model Fine-Tuning

Add code
May 30, 2025
Viaarxiv icon

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Add code
May 30, 2025
Viaarxiv icon

On the $O(\frac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm

Add code
May 17, 2025
Viaarxiv icon

Convergence Rate Analysis of LION

Add code
Nov 12, 2024
Viaarxiv icon