Picture for Xiaonan Nie

Xiaonan Nie

DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning

Add code
Sep 02, 2024
Figure 1 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Figure 2 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Figure 3 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Figure 4 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Viaarxiv icon

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

Add code
Aug 27, 2024
Figure 1 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Figure 2 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Figure 3 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Figure 4 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Viaarxiv icon

Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

Add code
Jul 16, 2024
Figure 1 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 2 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 3 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 4 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Viaarxiv icon

Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

Add code
May 01, 2024
Figure 1 for Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Figure 2 for Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Figure 3 for Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Figure 4 for Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Viaarxiv icon

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

Add code
Jul 05, 2023
Figure 1 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 2 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 3 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 4 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Viaarxiv icon

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

Add code
Apr 08, 2023
Viaarxiv icon

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

Add code
Mar 06, 2023
Viaarxiv icon

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Add code
Nov 25, 2022
Viaarxiv icon

Dense-to-Sparse Gate for Mixture-of-Experts

Add code
Dec 29, 2021
Figure 1 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 2 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 3 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 4 for Dense-to-Sparse Gate for Mixture-of-Experts
Viaarxiv icon

HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework

Add code
Dec 14, 2021
Figure 1 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Figure 2 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Figure 3 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Figure 4 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Viaarxiv icon