Picture for Haoran Lian

Haoran Lian

MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

Add code
Jul 13, 2024
Viaarxiv icon

Temporal Scaling Law for Large Language Models

Add code
Apr 27, 2024
Viaarxiv icon

Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

Add code
Apr 27, 2024
Viaarxiv icon