Picture for Shirong Ma

Shirong Ma

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Add code
May 14, 2025
Figure 1 for Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Figure 2 for Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Figure 3 for Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Figure 4 for Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Viaarxiv icon

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Add code
Apr 30, 2025
Viaarxiv icon

Inference-Time Scaling for Generalist Reward Modeling

Add code
Apr 03, 2025
Viaarxiv icon

DeepSeek-V3 Technical Report

Add code
Dec 27, 2024
Figure 1 for DeepSeek-V3 Technical Report
Figure 2 for DeepSeek-V3 Technical Report
Figure 3 for DeepSeek-V3 Technical Report
Figure 4 for DeepSeek-V3 Technical Report
Viaarxiv icon

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Add code
Jun 17, 2024
Figure 1 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Figure 2 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Figure 3 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Figure 4 for DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Viaarxiv icon

Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction

Add code
Feb 18, 2024
Figure 1 for Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction
Figure 2 for Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction
Figure 3 for Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction
Figure 4 for Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction
Viaarxiv icon

Mitigating Catastrophic Forgetting in Multi-domain Chinese Spelling Correction by Multi-stage Knowledge Transfer Framework

Add code
Feb 18, 2024
Viaarxiv icon

When LLMs Meet Cunning Questions: A Fallacy Understanding Benchmark for Large Language Models

Add code
Feb 16, 2024
Figure 1 for When LLMs Meet Cunning Questions: A Fallacy Understanding Benchmark for Large Language Models
Figure 2 for When LLMs Meet Cunning Questions: A Fallacy Understanding Benchmark for Large Language Models
Figure 3 for When LLMs Meet Cunning Questions: A Fallacy Understanding Benchmark for Large Language Models
Figure 4 for When LLMs Meet Cunning Questions: A Fallacy Understanding Benchmark for Large Language Models
Viaarxiv icon

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Add code
Jan 05, 2024
Figure 1 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 2 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 3 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 4 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Viaarxiv icon

EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data

Add code
Dec 25, 2023
Viaarxiv icon