Picture for Alfred Shen

Alfred Shen

Gated Sparse Attention: Combining Computational Efficiency with Training Stability for Long-Context Language Models

Add code
Jan 12, 2026
Viaarxiv icon

How Well Self-Supervised Pre-Training Performs with Streaming Data?

Add code
Apr 25, 2021
Figure 1 for How Well Self-Supervised Pre-Training Performs with Streaming Data?
Figure 2 for How Well Self-Supervised Pre-Training Performs with Streaming Data?
Figure 3 for How Well Self-Supervised Pre-Training Performs with Streaming Data?
Figure 4 for How Well Self-Supervised Pre-Training Performs with Streaming Data?
Viaarxiv icon