Picture for Jianfei Chen

Jianfei Chen

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Add code
Dec 19, 2024
Viaarxiv icon

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Add code
Nov 17, 2024
Figure 1 for SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Figure 2 for SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Figure 3 for SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Figure 4 for SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Viaarxiv icon

Consistency Diffusion Bridge Models

Add code
Oct 31, 2024
Figure 1 for Consistency Diffusion Bridge Models
Figure 2 for Consistency Diffusion Bridge Models
Figure 3 for Consistency Diffusion Bridge Models
Figure 4 for Consistency Diffusion Bridge Models
Viaarxiv icon

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

Add code
Oct 25, 2024
Figure 1 for COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Figure 2 for COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Figure 3 for COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Figure 4 for COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Viaarxiv icon

Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs

Add code
Oct 21, 2024
Figure 1 for Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs
Figure 2 for Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs
Figure 3 for Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs
Figure 4 for Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs
Viaarxiv icon

FrameBridge: Improving Image-to-Video Generation with Bridge Models

Add code
Oct 20, 2024
Figure 1 for FrameBridge: Improving Image-to-Video Generation with Bridge Models
Figure 2 for FrameBridge: Improving Image-to-Video Generation with Bridge Models
Figure 3 for FrameBridge: Improving Image-to-Video Generation with Bridge Models
Figure 4 for FrameBridge: Improving Image-to-Video Generation with Bridge Models
Viaarxiv icon

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

Add code
Oct 07, 2024
Figure 1 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 2 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 3 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 4 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Viaarxiv icon

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Add code
Oct 03, 2024
Figure 1 for SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Figure 2 for SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Figure 3 for SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Figure 4 for SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Viaarxiv icon

S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training

Add code
Sep 13, 2024
Figure 1 for S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
Figure 2 for S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
Figure 3 for S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
Figure 4 for S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
Viaarxiv icon

1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit

Add code
Aug 26, 2024
Figure 1 for 1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
Figure 2 for 1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
Figure 3 for 1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
Figure 4 for 1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
Viaarxiv icon