Picture for Jiarui Fang

Jiarui Fang

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Add code
Nov 04, 2024
Viaarxiv icon

PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models

Add code
May 23, 2024
Viaarxiv icon

A Unified Sequence Parallelism Approach for Long Context Generative AI

Add code
May 15, 2024
Viaarxiv icon

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference

Add code
Jan 19, 2024
Figure 1 for AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Figure 2 for AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Figure 3 for AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Figure 4 for AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Viaarxiv icon

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

Add code
Feb 22, 2023
Figure 1 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Figure 2 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Figure 3 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Figure 4 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Viaarxiv icon

Elixir: Train a Large Language Model on a Small GPU Cluster

Add code
Dec 10, 2022
Viaarxiv icon

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models

Add code
Sep 06, 2022
Figure 1 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Figure 2 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Figure 3 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Figure 4 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Viaarxiv icon

A Frequency-aware Software Cache for Large Recommendation System Embeddings

Add code
Aug 08, 2022
Figure 1 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Figure 2 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Figure 3 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Figure 4 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Viaarxiv icon

PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management

Add code
Aug 12, 2021
Figure 1 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Figure 2 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Figure 3 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Figure 4 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Viaarxiv icon

TurboTransformers: An Efficient GPU Serving System For Transformer Models

Add code
Oct 09, 2020
Figure 1 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Figure 2 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Figure 3 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Figure 4 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Viaarxiv icon