Picture for Shuaipeng Li

Shuaipeng Li

More Expressive Attention with Negative Weights

Add code
Nov 14, 2024
Viaarxiv icon

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Add code
Nov 05, 2024
Figure 1 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 2 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 3 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 4 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Viaarxiv icon

HMoE: Heterogeneous Mixture of Experts for Language Modeling

Add code
Aug 20, 2024
Figure 1 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Figure 2 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Figure 3 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Figure 4 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Viaarxiv icon

Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

Add code
Jul 16, 2024
Figure 1 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 2 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 3 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 4 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Viaarxiv icon

Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

Add code
May 23, 2024
Figure 1 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Figure 2 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Figure 3 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Figure 4 for Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Viaarxiv icon

HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs

Add code
Nov 22, 2019
Figure 1 for HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Figure 2 for HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Figure 3 for HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Figure 4 for HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Viaarxiv icon

3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds

Add code
Jul 21, 2017
Figure 1 for 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Figure 2 for 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Figure 3 for 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Figure 4 for 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Viaarxiv icon