Picture for Dian Jiao

Dian Jiao

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Add code
Nov 05, 2024
Figure 1 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 2 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 3 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 4 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Viaarxiv icon

Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

Add code
Sep 27, 2024
Figure 1 for Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
Figure 2 for Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
Figure 3 for Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
Figure 4 for Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
Viaarxiv icon

Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

Add code
Jul 16, 2024
Figure 1 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 2 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 3 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 4 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Viaarxiv icon

IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization

Add code
Jul 15, 2024
Viaarxiv icon

Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

Add code
May 23, 2024
Figure 1 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Figure 2 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Figure 3 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Figure 4 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Viaarxiv icon

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

Add code
Mar 06, 2023
Viaarxiv icon