Picture for Damai Dai

Damai Dai

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Add code
Dec 13, 2024
Viaarxiv icon

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Add code
Aug 28, 2024
Viaarxiv icon

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Add code
Jul 02, 2024
Figure 1 for Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Figure 2 for Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Figure 3 for Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Figure 4 for Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Viaarxiv icon

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Add code
Jun 17, 2024
Figure 1 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Figure 2 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Figure 3 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Figure 4 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Viaarxiv icon

Exploring Activation Patterns of Parameters in Language Models

Add code
May 28, 2024
Figure 1 for Exploring Activation Patterns of Parameters in Language Models
Figure 2 for Exploring Activation Patterns of Parameters in Language Models
Figure 3 for Exploring Activation Patterns of Parameters in Language Models
Figure 4 for Exploring Activation Patterns of Parameters in Language Models
Viaarxiv icon

Large Language Models Are Unconscious of Unreasonability in Math Problems

Add code
Mar 28, 2024
Viaarxiv icon

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

Add code
Feb 25, 2024
Viaarxiv icon

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Add code
Jan 11, 2024
Figure 1 for DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Figure 2 for DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Figure 3 for DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Figure 4 for DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Viaarxiv icon

Language Models Understand Numbers, at Least Partially

Add code
Jan 08, 2024
Viaarxiv icon

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Add code
Jan 05, 2024
Figure 1 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 2 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 3 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 4 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Viaarxiv icon