Picture for Weigao Sun

Weigao Sun

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

Add code
Nov 24, 2024
Viaarxiv icon

Scaling Laws for Linear Complexity Language Models

Add code
Jun 24, 2024
Viaarxiv icon

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Add code
May 27, 2024
Viaarxiv icon

Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

Add code
May 27, 2024
Viaarxiv icon

HGRN2: Gated Linear RNNs with State Expansion

Add code
Apr 11, 2024
Viaarxiv icon

Linear Attention Sequence Parallelism

Add code
Apr 03, 2024
Figure 1 for Linear Attention Sequence Parallelism
Figure 2 for Linear Attention Sequence Parallelism
Figure 3 for Linear Attention Sequence Parallelism
Figure 4 for Linear Attention Sequence Parallelism
Viaarxiv icon

MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes

Add code
Mar 01, 2024
Viaarxiv icon

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

Add code
Jan 29, 2024
Viaarxiv icon

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Add code
Jan 15, 2024
Viaarxiv icon

Scaling TransNormer to 175 Billion Parameters

Add code
Jul 27, 2023
Viaarxiv icon