Picture for Zhengyan Zhang

Zhengyan Zhang

DeepSeek-V3 Technical Report

Add code
Dec 27, 2024
Viaarxiv icon

Exploring the Benefit of Activation Sparsity in Pre-training

Add code
Oct 04, 2024
Viaarxiv icon

Configurable Foundation Models: Building LLMs from a Modular Perspective

Add code
Sep 04, 2024
Figure 1 for Configurable Foundation Models: Building LLMs from a Modular Perspective
Figure 2 for Configurable Foundation Models: Building LLMs from a Modular Perspective
Figure 3 for Configurable Foundation Models: Building LLMs from a Modular Perspective
Figure 4 for Configurable Foundation Models: Building LLMs from a Modular Perspective
Viaarxiv icon

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Add code
Jun 11, 2024
Figure 1 for Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Figure 2 for Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Figure 3 for Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Figure 4 for Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Viaarxiv icon

Robust and Scalable Model Editing for Large Language Models

Add code
Mar 26, 2024
Viaarxiv icon

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

Add code
Feb 27, 2024
Viaarxiv icon

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

Add code
Feb 07, 2024
Viaarxiv icon

ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs

Add code
Feb 06, 2024
Viaarxiv icon

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

Add code
Oct 24, 2023
Viaarxiv icon

CPET: Effective Parameter-Efficient Tuning for Compressed Large Language Models

Add code
Jul 15, 2023
Viaarxiv icon