Picture for Shuming Ma

Shuming Ma

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

Add code
Feb 17, 2025
Viaarxiv icon

Next Block Prediction: Video Generation via Semi-Autoregressive Modeling

Add code
Feb 12, 2025
Viaarxiv icon

RedStone: Curating General, Code, Math, and QA Data for Large Language Models

Add code
Dec 04, 2024
Viaarxiv icon

MH-MoE: Multi-Head Mixture-of-Experts

Add code
Nov 26, 2024
Viaarxiv icon

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Add code
Nov 07, 2024
Figure 1 for BitNet a4.8: 4-bit Activations for 1-bit LLMs
Figure 2 for BitNet a4.8: 4-bit Activations for 1-bit LLMs
Figure 3 for BitNet a4.8: 4-bit Activations for 1-bit LLMs
Figure 4 for BitNet a4.8: 4-bit Activations for 1-bit LLMs
Viaarxiv icon

1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

Add code
Oct 21, 2024
Viaarxiv icon

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Add code
Jul 15, 2024
Figure 1 for Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Figure 2 for Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Figure 3 for Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Figure 4 for Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Viaarxiv icon

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Add code
May 08, 2024
Viaarxiv icon

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Add code
Feb 27, 2024
Viaarxiv icon

When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology

Add code
Dec 06, 2023
Viaarxiv icon