Picture for Damai Dai

Damai Dai

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Add code
Aug 28, 2024
Viaarxiv icon

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Add code
Jul 02, 2024
Figure 1 for Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Figure 2 for Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Figure 3 for Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Figure 4 for Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Viaarxiv icon

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Add code
Jun 17, 2024
Figure 1 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Figure 2 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Figure 3 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Figure 4 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Viaarxiv icon

Exploring Activation Patterns of Parameters in Language Models

Add code
May 28, 2024
Viaarxiv icon

Large Language Models Are Unconscious of Unreasonability in Math Problems

Add code
Mar 28, 2024
Viaarxiv icon

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

Add code
Feb 25, 2024
Viaarxiv icon

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Add code
Jan 11, 2024
Figure 1 for DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Figure 2 for DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Figure 3 for DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Figure 4 for DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Viaarxiv icon

Language Models Understand Numbers, at Least Partially

Add code
Jan 08, 2024
Viaarxiv icon

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Add code
Jan 05, 2024
Figure 1 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 2 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 3 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 4 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Viaarxiv icon

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Add code
Dec 28, 2023
Figure 1 for Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Figure 2 for Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Figure 3 for Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Figure 4 for Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Viaarxiv icon