Picture for Shuaipeng Li

Shuaipeng Li

Scaling Laws for Floating Point Quantization Training

Add code
Jan 05, 2025
Viaarxiv icon

More Expressive Attention with Negative Weights

Add code
Nov 14, 2024
Viaarxiv icon

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Add code
Nov 05, 2024
Figure 1 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 2 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 3 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 4 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Viaarxiv icon

HMoE: Heterogeneous Mixture of Experts for Language Modeling

Add code
Aug 20, 2024
Figure 1 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Figure 2 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Figure 3 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Figure 4 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Viaarxiv icon

Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

Add code
Jul 16, 2024
Figure 1 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 2 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 3 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 4 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Viaarxiv icon

Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

Add code
May 23, 2024
Figure 1 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Figure 2 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Figure 3 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Figure 4 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Viaarxiv icon

HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs

Add code
Nov 22, 2019
Figure 1 for HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Figure 2 for HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Figure 3 for HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Figure 4 for HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Viaarxiv icon

3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds

Add code
Jul 21, 2017
Figure 1 for 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Figure 2 for 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Figure 3 for 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Figure 4 for 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Viaarxiv icon