Picture for Xiaonan Nie

Xiaonan Nie

DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning

Add code
Sep 02, 2024
Figure 1 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Figure 2 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Figure 3 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Figure 4 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Viaarxiv icon

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

Add code
Aug 27, 2024
Figure 1 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Figure 2 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Figure 3 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Figure 4 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Viaarxiv icon

Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

Add code
Jul 16, 2024
Viaarxiv icon

Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

Add code
May 01, 2024
Viaarxiv icon

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

Add code
Jul 05, 2023
Viaarxiv icon

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

Add code
Apr 08, 2023
Viaarxiv icon

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

Add code
Mar 06, 2023
Viaarxiv icon

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Add code
Nov 25, 2022
Viaarxiv icon

Dense-to-Sparse Gate for Mixture-of-Experts

Add code
Dec 29, 2021
Figure 1 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 2 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 3 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 4 for Dense-to-Sparse Gate for Mixture-of-Experts
Viaarxiv icon

HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework

Add code
Dec 14, 2021
Figure 1 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Figure 2 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Figure 3 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Figure 4 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Viaarxiv icon