Picture for Yutao Zeng

Yutao Zeng

Virtual Width Networks

Add code
Nov 17, 2025
Figure 1 for Virtual Width Networks
Figure 2 for Virtual Width Networks
Figure 3 for Virtual Width Networks
Figure 4 for Virtual Width Networks
Viaarxiv icon

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Add code
Aug 26, 2025
Figure 1 for UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Figure 2 for UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Figure 3 for UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Figure 4 for UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Viaarxiv icon

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Add code
May 30, 2025
Viaarxiv icon

Scaling Law for Quantization-Aware Training

Add code
May 20, 2025
Figure 1 for Scaling Law for Quantization-Aware Training
Figure 2 for Scaling Law for Quantization-Aware Training
Figure 3 for Scaling Law for Quantization-Aware Training
Figure 4 for Scaling Law for Quantization-Aware Training
Viaarxiv icon

Efficient Pretraining Length Scaling

Add code
Apr 21, 2025
Viaarxiv icon

Frac-Connections: Fractional Extension of Hyper-Connections

Add code
Mar 18, 2025
Viaarxiv icon

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Add code
Mar 06, 2025
Viaarxiv icon

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

Add code
Feb 21, 2025
Figure 1 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Figure 2 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Figure 3 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Figure 4 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Viaarxiv icon

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

Add code
Feb 18, 2025
Viaarxiv icon

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Add code
Jan 28, 2025
Viaarxiv icon